Executive Summary 



This project is about facial asymmetry, its connection to emotional expression, 
and methods of measuring facial asymmetry in videos of faces. The research was 
motivated by two factors: firstly, there was a real opportunity to develop a novel 
measure of asymmetry that required minimal human involvement and that im- 
proved on earlier measures in the literature; and secondly, the study of the rela- 
tionship between facial asymmetry and emotional expression is both interesting 
in its own right, and important because it can inform neuropsychological theory 
and answer open questions concerning emotional processing in the brain. The two 
aims of the research were: first, to develop an automatic frame- by- frame measure 
of facial asymmetry in videos of faces that improved on previous measures; and 
second, to use the measure to analyse the relationship between facial asymmetry 
and emotional expression, and connect our findings with previous research of the 
relationship. The project is best described as 80% investigatory and 20% software 
development. Since submitting the research review, the main accomplishments 
have been: 



the original research review (which forms part of Chapter 2) has been ex- 



panded with greater detail on Active Shape Models (Section 2.4.1 ) and Active 



Appearance Models (Section 2.4.2) and a new section (2.4.3) on Procrustes 
analysis. (However, it should be noted that the time spent on Chapter 2 was 
no greater than 25% of the total time spent on the project.) 

two original measures of asymmetry were devised and implemented using an 



Active Appearance Model library and OpenCV (Sections 3.4.1 and 3.4.2). 
One of these measures achieved our first aim; i.e., it was automatic and 
improved on previous measures in the literature. 

a measure of left-sided and right-sided facial movement (relative to a neutral 



face) was devised and implemented (Section 4.3.1) and used as a novel mea- 



surement of the strength of a subject's emotional expression. The measure 
was also used to investigate whether one side of the face moved more than 



the other during the expression of positive emotions (Section 4.3.2). 

the relationship between strength of emotional expression and degree of 
asymmetry was plotted for subjects expressing laughter and happiness (Fig- 
ures 4.7 and 4.9). The results gave rise to a hypothesis for future research: 



that magnitude of facial asymmetry increases with strength of emotional ex- 
pression, with the increases larger for strong expressions (such as laughter). 



a thorough critical evaluation of the project was conducted (Section 5.2) and 
three interesting and fruitful directions for future research were identified 



(Section 5.3). 
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Chapter 1 

Introduction 



This project is about facial asymmetry, its connection to emotional expression, and 
methods of measuring facial asymmetry in videos of faces. Psychologists have long 
been interested in facial asymmetry and its connections to - amongst other things 
- attractiveness, health and personality type. Distinctly, neuropsychologists have 
looked at the asymmetry of a moving and expressive face, in the hope that it can 
teach them about emotional processing in the brain (if one side of the face moves 
more during expression, then this might be because one side of the brain is more 
active during expression). Whilst measuring the asymmetry of static and neutral 
faces is important and interesting, our focus is on measuring the asymmetry of 
moving and expressive faces. 

Before the last decade, measurements of asymmetry were either performed di- 
rectly by humans (either by perceiving asymmetry by eye or by measuring asym- 
metry by hand) or by electromyography (detecting electrical activity in the facial 
muscles, due to their movement). Measurements by hand or eye are limited because 
they are time-consuming. Measurements by electromyography are limited because 
they measure changes in electrical activity rather than changes in the appearance 
of the face, and the two need not be perfectly correlated. Techniques from the field 
of computer vision provide the opportunity for a less labour-intensive measure of 
asymmetry, and the collection of larger sets of data. In the last few years there 
have been some studies on asymmetry and emotional expression that have used 
techniques drawn from computer vision, but they have been limited in number. We 
believe that there is a real opportunity to develop a novel measure of asymmetry 
that requires minimal human involvement and that improves on earlier measures. 
Furthermore, recent studies have tended to focus on the direction of asymmetry 
during emotional expression (whether one side moves more than the other) rather 
than on the magnitude of asymmetry. We believe that interesting results can be 
obtained by focussing on the magnitude of asymmetry and its relation to the type 
(e.g. happy or sad) and strength of emotional expression. It is these two beliefs 
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that inform the aims of our research. 



1.1 Aims of research 

There are two aims of our research: 

• to develop an automatic frame-by-frame measure of facial asymmetry in 
videos of faces that improves on previous measures 

• to use the measure to analyse the relationship between facial asymmetry and 
emotional expression, and connect our findings with previous research of the 
relationship 



1.2 Structure of the report 

Excluding this introductory chapter, the report is divided into four chapters. 
Chapter [2] provides the background and context for our research and presents 
the theory that we will rely on to build our measure of asymmetry. The chapter's 



first section (Section 2.2) is concerned with demonstrating why facial asymmetry is 
important and with reporting the results of previous research into the connections 
between asymmetry and attractiveness, health, personality type and emotional 



expression. The second section (Section 2.3) concentrates on more recent studies 



that looked at the relationship between asymmetry and emotional expression. The 



third and final section (Section 2.4) builds a research toolbox by presenting the 
theory that will be drawn on in the following chapters. 

Chapter[3]is directed towards the first aim of our research and describes in detail 
the design and development of two measures of asymmetry. Although the aim of 
the project is to develop a single measure, after developing the first measure, a 
second was developed and compared to the first. Even though one of the measures 
was ultimately discarded, its development helped us to realise some of the strengths 
of the measure that was preferred. We end the chapter by discussing the limitations 
of the preferred measure. 

Chapter [4] is directed towards the second aim of our research and uses the 
measure developed in the preceding chapter to analyse two videos of subjects dis- 
playing happiness and laughter. Data that were collected were analysed in several 



ways. Section 4.2.2| charts asymmetry by frame number for the first subject and 



allows us to identify frames of elevated asymmetry. Section 4.3.2 charts movement 
of the left-side of the face and movement of the right-side of the face by frame 
number (for the first subject) and allows us to establish if one side moves more 
than the other. In Section |4.3.3| we chart strength of emotional expression against 
magnitude of asymmetry to see if we can discover a relationship. We end the 
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chapter by repeating the experiments for a second subject (Section 4.4) to see if 
our results are replicated. 

Chapter [5] is the conclusion. We review the report and project and discuss the 
strengths and weaknesses of the work undertaken. We refer back to the project's 
aims to see if they have been accomplished. Possible improvements to the work 
are discussed and future avenues of research are suggested. 
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Chapter 2 

Background, context and 
relevant previous research 



2.1 About this chapter 

In this chapter we set the scene for the project by surveying previous research 



from relevant areas. The chapter is divided into three sections. In Section 2.2 
examine previous research by psychologists and neuropsychologists on facial asym- 
metry that helps to show why we should be interested in facial asymmetry at all. 
Traditionally, facial asymmetry has been of interest to two groups of researchers. 
Psychologists have been interested in the asymmetry of neutral (i.e. emotionless) 
faces and have connected symmetry to attractiveness, health and personality type. 
Neuropsychologists have been particularly interested in looking at how the level of 
facial asymmetry changes during emotional expression, because this connects with 
understanding if emotional processing in the brain is focussed on one particular 
side of the brain. Our project is more concerned with the latter, but a proper 
survey on facial asymmetry should not neglect the former, and so we discuss both. 



In Section 2.3 we turn our attention to more recent attempts to study facial 
asymmetry - and how it changes during emotional expression - that use techniques 
from the field of computer vision. We find that these attempts have been limited 



in number and in scope. Sections 2.2 and 2.3 jointly motivate our project; to- 
gether they show why we should be interested in facial asymmetry and emotional 
expression and why there is an opportunity to improve on previous measurement 
techniques. 



In the third and final section (Section 2.4) we build the research toolbox that 
will be used in our approach to measuring asymmetry. In particular, we introduce 
and describe the theory underlying Active Shape Models and Active Appearance 
Models - which can be used for modelling faces - and we prove an important 
theorem from Procrustes Analysis that we will use in Chapter 3. 
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2.2 Psychology, neuropsychology and asymmetry 



2.2.1 Three kinds of asymmetry 



Although the human body exhibits a degree of symmetry across the midsaggi- 



tal plane (Figure 2.1), this symmetry is imperfect, both internally and externally. 



Psychologists have classified the asymmetry into three kinds (Graham et al. , 1994 



Van Valen, 1962). The first kind is Directional Asymmetry, which refers to left- 



right asymmetry that is characteristic of a population. Some directional asymme- 
tries are clear and established for humans: for example, the heart is slightly offset 
to the left-side of the body. Other directional asymmetries remain as hypotheses. 
For example, some studies have found that the larger side of the human face is 



typically the left (Vig and Hewitt 1975); whereas others have found that it is 



typically the right side (Simmons et al. 2004). It has been suggested that these 



differences in results may be due to differences in gender or age of the subjects of 



the relevant studies (Ercan et al. 2008). 




Figure 2.1: The midsaggital plane is the vertical plane through the mid-line of the 
body, dividing it into two approximately symmetric halves. 



The second kind of asymmetry is known as Antisymmetry (Kowner, 2001). 



Some traits in a population may be naturally dominant on one side of the plane of 
symmetry (and thus asymmetric); but, in contrast to traits displaying directional 
asymmetry, the side of dominance is unpredictable. All that is known is that the 
trait is typically asymmetric in the population. 

The final kind of asymmetry is known as Fluctuating Asymmetry (FA); the 
term was first used almost 80 years ago (Ludwig 1932). In contrast to directional 



asymmetry and antisymmetry, fluctuating asymmetry is a property of individuals 
rather than a property of populations. A trait in an individual is said to display 
fluctuating asymmetry if it is asymmetric in the individual but typically symmetric 
in the population. This type of asymmetry is known as fluctuating because the 
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direction of asymmetry appears to be random (Kowner, 2001). It has been shown 
that FA is partly heritable, insofar as the magnitude of asymmetry is heritable; but 
the direction of asymmetry is not genetically determined. (Figure 2.2 illustrates 
the distinct distributions associated with the three kinds of asymmetry.) 




Figure 2.2: Traits exhibiting fluctuating asymmetry have a mean asymmetry of 
zero, with a normal distribution. Variations in individuals from this mean symme- 
try is non-genetic. Traits exhibiting directional asymmetry show a skewed normal 
distribution with mean asymmetry. They are partly genetically determined (i.e., 
dependent on genotypes). Traits exhibiting antisymmetry show either right-sided 
asymmetry or left-sided asymmetry, but the direction of asymmetry is unpre- 



dictable. (Image taken from Palmer and Strobeck (1992)) 



FA is taken to be a measure of Developmental Instability (Van Valen, 1962 



Zakharov, 1981). Developmental instability refers to an organism's inability to 



protect its development against small and random changes to its environment 
(Lens et al. 2002). Research has shown that developmental instability is nega- 



tively correlated with fitness components such as longevity and survival (Moller 



1997). Furthermore, reviews of the relationship between sexual selection and de- 



velopmental instability have indicated mate preference for individuals displaying 



lower levels of FA (Moller, 1993, Thornhill, 1992 Watson and Thornhill, 1994). 
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2.2.2 Symmetry and attractiveness 



Numerous studies have investigated the relationship between human symmetry 
and attractiveness to potential mates. Some have considered symmetry of the 



body in general (Hume and Montgomerie, 2001 Rikowski and Grammer, 1999) 



but most have focussed on symmetry of the face in particular, and this will be 
the focus of our project. Earlier studies found that facial symmetry was not posi- 
tively related to facial attractiveness; in fact, in some studies, a negative relation 
was found - slightly asymmetric faces were preferred to perfectly symmetric ones 



(Kowner, 1996; Langlois et al. 1994 Swaddle and Cuthill, 1995). However, from 



more recent studies, there is a growing body of evidence that facial symmetry is 



positively correlated with ratings of attractiveness for both Western cultures (Per- 



rett et al. 1999 Rhodes et al. 1999a) and non- Western cultures (Little et al. 



2007, Rhodes et al. 2001a). Some authors have suggested that symmetric faces 
are more attractive in virtue of their "averageness" (separate research has shown 
that averaging human faces across a population tends to increase attractiveness 



(Rhodes and Tremewan 1996)). However, (Rhodes et al. 1999b) have used regres 



sion analyses to show that symmetry contributes to attractiveness even when the 
effects of averageness are partialed out, suggesting that symmetry is independently 
attractive. 



(Rhodes 2006) suggests that the reason that earlier studies found that facial 



symmetry was not positively correlated with attractiveness is that they relied on 
a flawed methodology. Namely, by simply reflecting one hemiface to create a 
perfectly symmetric face, certain features of the symmetric face were abnormally 
large or abnormally small. The approach of the later studies was to create two 
perfectly symmetric faces - one from each hemiface - and then blend the two faces 
together to create a perfectly symmetric face with the abnormalities smoothed out 
(Perrett et al. 1999 Rhodes et al. 1999a|b ). This symmetric face turned out to 



be more attractive than its asymmetric parents. Figure |2.3| provides an example 
of the results of the two approaches. In (c) and (d), symmetric faces are obtained 
by reflecting the right and left hemifaces respectively. This results in unnatural 
looking features; for example, the man's mouth in (c) looks unnaturally small, 
whereas his mouth in (d) looks unnaturally large, (b) is the result of blending the 
left mirrored image with the right mirrored face, and is the approach taken by 
later studies, such as Perrett and colleagues (1999). The large mouth of (d) and 
the small mouth of (c) are blended to create an average and symmetric mouth. 

A recent paper has shown that symmetric faces are not only rated as more 
attractive, but that symmetry influences the real-world selection of sexual part- 



ners (Burriss et al. 2011). As discussed earlier, fluctuating asymmetry is taken 



to be a measure of developmental instability, and so body symmetry (and fa- 
cial symmetry) may be an indicator of genetic quality. One explanation then 
for the connections between facial symmetry, attractiveness and selection of sex- 
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Figure 2.3: The image illustrates the different ways of creating symmetric faces, 
(a) is the unaltered face; (c) and (d) are obtained by reflecting the right and left 
hemifaces respectively, (b) is obtained by blending the left mirrored image with 
the right mirrored image. (Image taken from Perrett et al. (1999])). 



ual partners is that using symmetry to guide selection has adaptive value (Fink 



and Penton-Voak, 2002; Simmons et al. 2004, Thornhill and Gangestad, 1993). If 



this explanation is correct, we would expect fluctuating asymmetry in particular to 
show a negative correlation with attractiveness, rather than directional asymmetry 
or antisymmetry (Rhodes, 2006). Some studies on symmetry and attractiveness 



have attempted to isolate fluctuating asymmetry. Simmons and colleagues (2004) 
found that directional asymmetry affected neither symmetry nor attractiveness 
judgements, whereas fluctuating asymmetry affected both. Their conclusion was 
that humans pick up on fluctuating asymmetry in faces and use it as an indicator 
of lower genetic quality. 
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2.2.3 Symmetry and other connections 
Symmetry and health 

Several studies have shown that symmetric faces are perceived as being healthier 



than less symmetric faces (Fink et al. 2006 Jones et al. 2001; Rhodes et al. 



2001b). However, interestingly, a link between symmetry and actual health has 



not yet been established (Rhodes, 2006 Rhodes et al. 2001b). (Hume and Mont 



gomerie, 2001) found a positive association between body mass index and facial 



asymmetry for females, but this did not extend to an association between reported 



health problems and facial asymmetry. In response, (Simmons et al. 2004) have 



suggested that the cost of developmental instability may be reduced longevity or fe- 



cundity rather than poor health. They point to a study by (Henderson and Anglin 



2003) that has identified a positive relationship between facial attractiveness and 



longevity, but acknowledge that further research is needed to better understand 
how asymmetry influences fitness across a broad set of measures. 



Symmetry and personality type 

Although not as heavily researched as the relationship between symmetry and 
attractiveness, there has been some recent interest in connections between facial 



symmetry and perceived and actual personality types. For example, (Noor and 



Evans 2003 ) investigated the relationship between facial symmetry and perceived 



personality across five dimensions of personality (known as the "big-five" factors 
( Digman 1990)): neuroticism, extraversion, openness, agreeableness and consci- 



entiousness. Based on the assumption that symmetry is used as an indicator 
of genetic quality, they hypothesised that more symmetric faces would be more 
likely to be perceived as possessing the desirable personality traits - extraversion, 
openness, agreeableness and conscientiousness - while less symmetric faces would 
be more likely to be perceived as possessing the undesirable trait - neuroticism. 
Their results - albeit from a small sample size - partially confirmed their hypoth- 
esis; they found that asymmetric faces were rated as significantly more neurotic, 



less agreeable and less conscientious. (Fink et al. 2005) replicated the research of 



Noor and Evans but for actual personality rather than perceived personality. They 
found a positive association between facial symmetry and actual extraversion, but 
- unexpectedly - a negative association between symmetry and actual openness. 
They suggested that further research was necessary to confirm or disconfirm these 
links. A similar experiment involving a larger data sample of participants also 



found a positive association between symmetry and extraversion (Pound et al. 



2007) 



The causal connection between facial symmetry and extraversion and other 
personality traits is as yet neither confirmed nor understood. We have seen that 
symmetry is linked to developmental stability and genetic quality, but it is not 
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clear whether it should be linked to actual personality traits. One possibility is 
that hormones such as testosterone and oestrogen affect both facial symmetry and 
personality jointly, resulting in certain associations. But further research is needed 



to draw any definite conclusions (Fink et al. 2006). 



2.2.4 Symmetry and facial expressions 

Our discussion so far has concentrated on the symmetry of motionless faces with 
neutral expressions. However, there has also been extensive research on the sym- 
metry - and lack of symmetry - of faces during emotional expression. One of the 
motivations for this research is that it connects with theory concerning cerebral 
hemispheric specialisation; the human brain is divided into two hemispheres - the 
left and right hemispheres - and hemispheric specialisation investigates the extent 
to which cognitive tasks are separated between these hemispheres. It has been 
well established that the right hemisphere exerts more control over the muscles of 



the left hemiface than the left hemisphere and vice versa (Rinn, 1984). Thus, if it 



is also the case that emotional expressions have higher intensity on one hemiface 
than the other, then this could indicate that the respective hemisphere (i.e. right 
hemisphere for left hemiface and vice versa) is the dominant controller of emotional 
displays on the face. 

The first mention of asymmetry during emotional expression dates back to 



(Darwin, 1872) who noted that from a sample of four Australian natives who were 



asked to sneer, two displayed the left canine tooth, one displayed the right, and the 



other displayed no asymmetry. Nearly seventy years later, (Lynn and Lynn, 1938) 



performed the first detailed study of facial asymmetry during emotional expression 
and found that although the majority of their subjects did not display significant 
asymmetry, some of their subjects did. They introduced the term "facedness" to 
correspond to the term "handedness". A person with left-facedness is a person 
whose left hemiface is dominant (i.e., shows greater movement) during emotional 
expression. 



(Ekman and Friesen, 1969) hypothesised the existence of seven universal (i.e., 
cross-cultural) emotional facial behaviours to signal seven different emotions: hap- 
piness, sadness, anger, fear, surprise, disgust and interest. During the following 
decade they worked on coding facial expressions and developed the Facial Action 



Coding System (Ekman and Friesen, 1978) which works by describing facial ex 



pressions as conjunctions of action units (examples of action units are parting the 
lips or lowering the brow - see Figure 2.4 for visual examples). 



At the same time, there was a resurgence of interest in the asymmetry of emo- 
tional expression and the start of systematic research of the field. (Borod et al. 



1998) produced an extensive meta-analysis of research prior to 1998. They con- 



sidered the results of 82 observations from 35 journal articles to see if areas of 
agreement between the results could be discerned. Of the 82 observations they 
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Figure 2.4: Examples of some action units from the Facial Action Coding System 



(image taken from Zhang et al. (2008)). 



found that 27 found no significant asymmetry during emotional expression, but 
that 48 found left hemiface dominance (i.e., left-facedness) and only 7 found right 
hemiface dominance. For examples of observations that found left hemiface domi- 
nance see (Borod et al. , 1990 Dopson et al. 1984); for an example of observations 
that found right hemiface dominance see (Sirota and Schwartz, 1982); and for 



examples that found no significant asymmetry see (Kop et al. 1991; Strauss and 



Kaplan 


, 1980 


)• 


( 


Borod et al. 



1998) also divided the 82 observations according to various cri- 



teria to investigate further hypotheses. The elicitation condition of an emotional 
expression refers to the circumstances that brought it about. Posed (or voluntary) 
expressions are deliberately produced by the subject, whereas spontaneous (or in- 
voluntary) expressions are instinctual reactions to a stimulus. Since there have 
been suggestions in the literature that different neuroanatomical systems are re- 
sponsible for posed and spontaneous expressions (for example, see Borod and Koff 
(1984), for a discussion), Borod and colleagues divided the 82 observations accord- 
ing to elicitation condition to see if the condition made a difference to the degree or 
direction of asymmetry in emotional expressions. They found that the literature 



showed no such significant difference (Borod et al. 1997) with the left hemiface 



tending to show greater emotional expressivity for both posed and spontaneous 
expressions. 

There are two primary hypotheses about cerebral hemispheric specialisation for 
emotional processing. One is the right hemisphere hypothesis which holds that the 



right hemisphere is dominant for all emotions irrespective of their valence ( Borod 



et al. 1988). (The valence of an emotion refers to its property of being either 
positive/pleasant - emotions such as happiness, interest or pleasant surprise - or 
negative/unpleasant - emotions such as fear, anger or sadness; for a variation of 



the positive/negative distinction see the approach/ withdrawal distinction (Dema- 



ree et al. , 2005).) The other hypothesis is the valence hypothesis which holds that 
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the right hemisphere is dominant for negative/unpleasant emotions whereas the 



left hemisphere is dominant for positive/pleasant emotions (Sackeim et al. 1982). 
Borod and colleagues (1997) divided the 82 observations according to the valence 
of the emotional expression observed. They found that negative emotional expres- 
sions were more likely to show a left hemiface dominance than positive emotions, 
but that for both negative and positive emotional expressions, a left hemiface 
dominance was more likely than a right hemiface dominance. Their conclusion 
was that there is evidence that emotional valence does influence the direction of 
asymmetry but that support for the valence hypothesis - which predicts right 
hemiface dominance for positive emotional expressions - is weaker than support 
for the right hemisphere hypothesis, which predicts left hemiface dominance for 
both positive and negative emotional expressions. They suggested that a further 
possibility is that positive emotional expressions are mediated by both the left and 



right hemispheres (Borod et al. 1997). 

Borod and colleagues (1998) considered two further criteria besides the elic- 
itation condition and emotional valence; namely, gender of the subject and the 
technique used to measure the degree of asymmetry. With respect to gender, an 
earlier study had found that females were more likely than males to show right 



hemisphere dominance for emotional processing (Ladavas et al. 1980), and this 
provided motivation for Borod and colleagues' meta-analysis. They selected the 
33 observations where it was possible to separate the results by gender and found 
that of these 33 observations, 23 showed no significant difference between genders; 
6 showed that males were more left-faced than females; and 4 showed that females 
were more left-faced than males. They concluded that there was no support for 
the hypothesis that males were more left-faced than females, nor vice versa (see 



Figure 2.5). 
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Figure 2.5: The results of 33 observations comparing male facedness to female 



facedness, separated by emotional valence. Taken from the meta-analysis by Borod 



et al. (1998) 



The final criterion considered by Borod and colleagues was the technique used 
to measure the degree of asymmetry. From the 82 observations considered in their 
meta-analysis, they identified four categories of measurement techniques. The most 
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popular category was using a group of human raters to decide whether one hemiface 
was more expressive than the other. The second most popular category involved 
muscle quantification - i.e., making measurements of the movement of muscles away 
from their starting position during emotional expression. The other two categories 
- both less popular - were electromyography (mostly on the zygomatic muscle) and 
self-report. (Electromyography refers to the technique of measuring muscle activity 



by recording the electrical activity of those muscles. See Sirota and Schwartz 



(1982), for an example.) Borod and colleagues found that measurement technique 
made a significant difference to the results of observations with the technique of 
using human raters showing a strong tendency to conclude left-facedness, but the 
techniques of electromyography and self-report tending to find no facial asymmetry. 
Muscle quantification techniques tended to find left-facedness, but not as frequently 



as techniques using human raters (see Figure 2.6). 
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Figure 2.6: The results of 82 observations as a function of measurement technique 



and emotional valence. Taken from the meta-analysis by (Borod et al. 1998) 



2.3 Measuring facial asymmetry and computer vision 



The four categories of measurement technique identified by (Borod et al. 1998) 
are all limited in one respect or another. Using human raters, self-report or muscle 
quantification requires human measurement by hand or by eye. Human measure- 
ment is subject to human error, but, more importantly, is highly time-consuming 
and limits the amount of data that can be collected. Measurement using elec- 
tromyography need not be time-consuming but is limited insofar as it measures 
changes in electrical activity rather than changes in the appearance of the face, 
and the two are not perfectly correlated. 

Borod and colleagues were writing in 1998 and computer vision has since de- 
veloped substantially. In recent years there have some, but not many, applications 
of techniques from the field to the problem of measuring facial asymmetry during 



emotional expression (Desai, 2009 Nicholls et al. 2004 Richardson et al. 2000). 



13 



In this section we report on two of the more recent ones. 



2.3.1 3D measurement by Nicholls et al. 

Nicholls and colleagues used a 3D physiognomic range finder to capture the facial 
expressions of their subjects. They were then able generate 3D images of their 
subjects which they could use to make 3D measurements of the facial asymmetry 
under facial expressions. They did this by rotating the images to the left and 
right in turn (by 35 degrees in each direction) and overlaying the image with 
emotion expressed onto a baseline image with a neutral pose. Prom this they 
created colour maps showing the amount of movement in different parts of the 



left and right sides of the face (see Figure 2.7 for an example) and could calculate 
measures of the overall movement on each side of the face. Their experiment only 
involved measurements for posed happy and sad expressions. They found that 
happy expressions involved more facial movement than sad expressions and that 
both showed a left hemiface dominance, but with the dominance much stronger 
for sad expressions than happy expressions. (Notice that this is consistent with 



the findings of (Borod et al. 1998), as discussed earlier. 



2.3.2 Measuring pixel changes by Desai 

Desai (2009) videotaped the posed facial expressions of 48 subjects, asking them to 
produce expressions of happiness, sadness, anger, fear, surprise and disgust. The 
videos were then digitised so that they could be analysed frame by frame and pixel 
by pixel. Following an earlier study by Richardson and colleagues (2000), Desai 
quantified the degree of facial movement between consecutive frames by comput- 
ing differences in pixel intensity for all pixels (the frames were 640 x 480 pixels at 
256 levels of grayscale). The entropy was calculated as the sum of pixel intensity 
differences across all corresponding pixels in the two consecutive frames. Entropy 
could then be charted by frame to identify pairs of consecutive frames where the 
overall change in pixel intensity was greatest. Working on the assumption that 
"changes in the surface lighting of the face reflect movement" , entropy was taken 
as an objective measure of facial movement. By dividing the face into left and right 
hemifaces, and recording entropy scores for each hemiface separately, facial asym- 



metry could also be measured (see Figure 2.8 for an example of charting entropy). 



Using this approach, Desai found that males displayed left hemiface dominance for 
all emotional expressions, whereas females displayed less significant left hemiface 
dominance, and even showed right hemiface dominance for expressions of sadness. 
This latter result runs counter to the findings of Borod and colleagues (1998). De- 
sai's explanation for the difference in asymmetry for males and females was that 
emotional processing may be less lateralised in the female brain than in the male 
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Figure 2.7: (a) shows a posed sad expression rotated by 35 degrees in two directions 
to display the left side of the face (top) and the right side of the face (bottom), 
(b) shows colour maps indicating the movement of each part of the face relative to 
the baseline face (i.e., face with neutral expression - not shown here). The left side 
of the face (top) shows more movement (indicated by yellow) than the right side 



of the face (bottom) which is predominantly green. (Image taken from Nicholls 
eTaT1([2004|)) 



15 





Right Herri face 



Left Hemiface 



Figure 2.8: Entropy (which is a measure of facial movement between consecutive 
frames) charted for both the right and left hemifaces, as the face displayed a happy 
expression. Peak entropy is at frame 17 and is likely to correspond - according to 
Desai - to the frame in which the facial expression of happiness was strongest. In 
this case the values of entropy are close for both hemifaces, and so no significant 



facial asymmetry is detected. (Image taken from (Desai, 2009)) 



brain, as seems to be the case for linguistic processing (e.g., see Shaywitz et al 



(1995)). 



2.4 Building a research toolbox 



Both |Nicholls et al.| ( |2004 ) and Desai (2009) admit to limitations of their measures. 
Nicholls and colleagues stated that their measure was only sensitive to movement 
perpendicular to the surface of the face and Desai assumed that "changes in the 
surface lighting of the face reflect movement" meaning that the measure would be 
affected by lighting variation. Furthermore, Desai's analysis of videos was limited 
to short snippets of a few seconds, and Nicholls and colleagues looked only at 
whether one side of the face moved more than the other, and did not employ a 
measure of asymmetry in their analysis. 

We now present the theory that will be used to develop an automatic measure 
of facial asymmetry that improves on earlier measures, and that can be used to 
analyse the relationship between facial asymmetry and emotional expression in 



ways neglected by Nicholls et al. (2004) and Desai (2009). 



2.4.1 Active Shape Models 

Active Shape Models (ASMs) are statistical models - developed in the 1990s - that 
have been used in a variety of contexts. For example, ASMs have been used in 



medical imaging to analyse MR images of the brain and locate structures (Hill 



et al. 1994) and to identify bones in radiographs of hip replacements (Kotcb 



eff et al. 1996). Other examples of applications are their use in visual speech 
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recognition (Luettin et al. 



recognition (Luettin et al. 1996), person identification from faces (Lanitis et al. 
1997), facial expression recognition (Lanitis et al. 1997), and in classifying crops 
in images ( Persson and Astrand , 2008 ) . 



Training 



ASMs were first developed by Cootes and Taylor ( |Cootes and Tay lor, 1993). There 
are two steps involved in their use; the training step and the fitting step. The 
training step requires a training set of images of the relevant kind of object. For 
example, if we are interested in face recognition then we need a set of images of 
faces. Training sets should contain a variety of images of the relevant kind of 
object, so that shape variations can be fully modelled (Cootes et al. 1995). For 



instance, if we are building a training set of human faces that will be used for person 
identification from faces with various expressions, presented at different angles, 
then we should ensure that we include faces of different subjects, with various 
facial expressions, presented at different angles, in our training set. Generally a 



larger training set is preferred (for example, Lanitis et al. (1997), use 160 images of 
faces in their training set), but of most importance is the variety of image within 
the set. Adequate, simpler models can be built from as few as 10-20 images in the 
training set. 

Each image in the training set must be annotated to obtain a set of points which 



represents the relevant shape (for example, see Figure 2.9). Annotation is typically 




Figure 2.9: A training image of a face annotated for use in an Active Shape Model. 



(Image taken from Lanitis et al. (1997)) 



done by hand, but methods have been developed for automatic annotation (Souza 



and Udupa, 2005). Since the training images are 2-dimensional, each set of points 
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can be represented by a vector of 2n-dimensions, where n is the number of points 
marked in each training image. If each point is represented as (xi,yi) then we can 
write the vector as: 

x = (xi, ...,x n ,yi, ...,y n ) 

We can think of each set of points as representing a shape. The mean shape 
is computed by averaging the N vectors (where N is the number of images in 
the training set) and the mean shape is centred on the origin (by translation) 
and scaled so that the sum of the squares of its coordinates is 1. Each shape in 
the training set is then aligned to the mean shape using rotations, translations 
and scaling. This is typically accomplished using Procrustes analysis (we present 



Procrustes analysis in more detail in Section 2.4.3). Alignment in this instance 



means minimising the sum of the squares of the distances between corresponding 
points of each shape and the mean shape. If the two shapes are (xi, x n , yi, ■~,y n ) 



and (x' 1: 



. . , x r 



,y[, ...,y' n ) then we want to minimise D, where: 



D 



n 

E< 

i=l 



(xi - + (yi - y^ 



l\2 



After alignment, we represent the set of N training shapes as a matrix with 2n 
rows and N columns. Call this matrix M. The matrix represents the range of 
shapes in the training set. If the j th training shape is represented by the vector 
(x±j, ...,x n j,yij, ...,y n j) then the matrix M has the form: 



M 



Xnl 

2/n 



XlN \ 

X n N 

yiN 



\ Vn\ ■■■ VnN ) 

Principal Components Analysis (PC A) \ Johnson and Wichern 1988) is applied to 
M to calculate the set of principal components for the set of shapes in the training 
set. The first step in PCA is to calculate the 2n by 2n covariance matrix for M. 
The covariance matrix is given by: 



1 N 

s = nB Xi - i)(Xi - f 



where x ; = (xij, ...,x n j,yij, ■■ 



, y n j) T and x is the mean shape given by 

N 



X 



i=l 
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The second step is to calculate the 2n (eigenvalue, eigenvector) pairs for S. An 
(eigenvalue, eigenvector) pair for S is a (scalar, vector) pair, (A, v), such that 



Sv = Av 

For large n, calculating the eigenvalues and eigenvectors is non-trivial. There exist 
various algorithms and software implementations (in, for example, MATLAB and 
OpenCV) that perform the calculation. 

The eigenvectors are ordered by the magnitude of their associated eigenvalues; 
the eigenvector with the largest eigenvalue is the principal component, and the 
eigenvector with the second largest eigenvalue is the second principal component, 
and so on. Although there are 2n eigenvectors, it may be that the first t of these 
(for some t less than 2n) capture most of the shape variation in the set of shapes. 
(Imagine, for example, a set of points in 2-dimensions that are clustered along a 
straight line; although 2 vectors are needed to capture all of the variation of the 
points, a single vector along the straight line will capture most of the variation.) 
The eigenvalue of an eigenvector equals the variance that the eigenvector accounts 
for; this means that the sum of the eigenvalues is the total variance of the training 
data, t is chosen so that the first t eigenvectors represent nearly all, but not 
necessarily all, of the variation. I.e., t is chosen so that: 



where p is the proportion of variation that we wish the t eigenvectors to capture 

2n 

(e.g., p = 0.98) and V is the total variance of the training data (i.e., V = \) 

The t eigenvectors are retained - and are used in the ASM - and the other 
eigenvectors are discarded. New shapes can be generated by taking the mean 
shape and adding linear combinations of the t eigenvectors. If A is the matrix of 
t eigenvectors, then a new shape, x, generated by the ASM, has the form: 

x = x + Ab 

where x is the mean shape and b is a t-dimensional vector containing the shape 
parameters for the new shape. Training data can be used to put constraints on 
likely values for b. This is done by recording the range of values of b from the 
training data. Assuming that the range of possible values of b can be modelled by a 
Gaussian distribution, we can calculate the mean and variance for this distribution 
from the training data, and use these to set limits for a plausible range of expected 
values of b. 



Figure 2.10 shows a set of face shapes generated by an Active Shape Model. 



The top row shows shapes generated by varying the contribution made by the first 
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eigenvector (i.e. eigenvector with largest associated eigenvalue). This is called 
the first mode of variation. The middle face is the mean shape (zero eigenvector 
contribution) . The second row shows shapes generated by varying the contribution 
made by the second eigenvector (second mode of variation), and so on. Different 
eigenvectors produce different behaviours. For example, the first mode of variation 
tilts the face up and down, whereas the fifth changes the facial expression. 



-4sd 



lsl mode 



2nd mode 



3rd mode 




4th 



5th mode p| tt | { 





6th mtKiE jvj <o i I <□> vn fN ^1 I 



Figure 2.10: The image shows a set of face shapes generated by an Active Shape 



Model in (Lanitis et al. 1997). In the top row new shapes are generated by adding 



multiples of the principal eigenvector (i.e., largest associated eigenvalue) to the 
mean shape. In the second row new shapes are generated by adding multiples of 
the second principal eigenvector to the mean shape. 



Fitting 

Once an ASM has been created from a training set, the model can be used to 
attempt to fit new images. Suppose we are presented with a new image. The goal 
of fitting is to find the vector of shape parameters, b, the rotation, 9, the scale 
factor, s, and the translation vector, (xt,yt), such that the shape x comes closest 
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to fitting the image, where: 



fi (x + Ab) 



(T represents the transformation that translates by (xt,yt), scales by s and rotates 
by 9.) To make sense of this we need a metric that quantifies the goodness of the 
shape's fit to the image. Providing that we have a means of detecting edges in 
the image and defining boundaries, we can use the sum of the squared distances 
between each point in the shape and the nearest boundary point of the image 



(Cootes et al. 1995). Figure 2.11 provides an illustration of this 



Normal to Model 
Boundary 



Nearest Edge 
on Normal (X',Y') 




Model Point (X,Y) 
Model Boundary 
Image Object 



Figure 2.11: The goodness of a shape model's fit to an image is measured by 
summing the squares of the distances between each model point and the nearest 



image boundary. (Image taken from Cootes and Taylor (2001a)) 



Fitting is performed iteratively. For a new image, the first shape used is the 
mean shape. If fitting is being applied to a video, then the shape that was used 
to fit the previous frame is used as the starting shape. Call the candidate model 
points x. Perpendicular lines from each model point are extended until edges are 



detected (see Figure 2.12). The point where the perpendicular intersects the edge 
is recorded. The collection of these points is called the set of image boundary 
points (call them y). Procrustes analysis is used to find the (6, (xt,yt),s) combi- 
nation whose associated transformation, when applied to x, minimises the distance 
between y and x. The inverse of this transformation is applied to y. The shape 
parameters, b, are then updated using: 



A T (y 



X 



The constraints on b that were derived in the training stage are applied to b 
(i.e., if b does not fall within the plausible range of values that was derived from 
the training data, it is scaled so that it does). The distance between the shape 
parameters that we started with and the shape parameters that we ended with is 
measured. If this is above a certain threshold, then the same steps are repeated 
using the updated parameters. If the distance is below the threshold, then the 
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Figure 2.12: Perpendiculars are extended from model points to find the nearest rel- 
evant boundary points. Changes in grayscale pixel information are used to detect 
boundaries. Training information is used to teach the model to correctly identify 
boundaries by sampling the pixel information that surrounds the landmarks of the 



training images. (Image taken from Cootes and Taylor (2001a)) 



fitting is completed. (The threshold is determined by the fitting accuracy we 
desire and the time allowed for each fitting. It is chosen by trial and error.) 



Figure 2.13 illustrates fitting an ASM to an image of a face. In this case, the 
result can be used to locate facial features. Fitting was completed in 18 iterations 



and in under a second ( Cootes and Taylor , 2001b ). As discussed earlier, ASMs have 




Initial 



After 2 iterations 



After 6 iterations 



After 18 iterations 



Figure 2.13: Fitting an ASM to an image of a face. (Image taken from Cootes and 



Taylor (2001b)) 



been used to locate structures in MR images of the brain (Hill et al. 1994), and 



to identify bones in radiographs of hip replacements (Kotcheff et al. 1996). They 



can also be used in facial expression identification, by fitting faces and determining 
whether the fitted model corresponds to particular expressions in the training set 
(i.e., whether the model parameters that fit the face are close to parameters which 



represent a particular expression in the training set) (Lanitis et al. 1997) 



2.4.2 Active Appearance Models 



Active Appearance Models (AAMs) are an extension to ASMs developed by ( Coote: 



et al. , 1998 ) which, like ASMs, have had many applications, such as use in medical 
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imaging ( 


Babalola et al. 


2008; 


Roberts et al. 


( Edwards et al. 


1998 


) and facial expression i 



2009). 



They extend ASMs by not only modelling shape, but modelling texture as well. 
This is done by training the model for texture as well as for shape. 



Training 

Training an AAM starts by training an ASM; i.e., as with ASMs, training images 
are annotated, and Principal Components Analysis (PCA) is used to compute the 
modes of variation for the set of training shapes. Following this, each image in 
the training set is warped to the mean shape, to effectively factor out the shape 
information so that the texture information can be sampled independently. The 
gray-level information from each warped training image is sampled. Each training 
image gives rise to a vector, gj, containing the gray- level information from the 
warped training image. The vectors are normalised to minimise the effect of global 
lighting variation across the training images. PCA is then applied to the set of 
normalised vectors to build a model of the gray-level information. As with PCA on 
the shape data, we are then able to express texture vectors as linear combinations 
of the eigenvectors added to the mean vector: 

g = g + A'b 

where g is the mean of the normalised vectors, A' is the matrix of eigenvectors 
and b is the vector of texture parameters. 

Because there is likely to be interdependence between shape and texture infor- 
mation, PCA is applied to the combined shape and texture information, to further 
simplify the model. Each training image has an associated set of shape parameters, 
bi s , and an associated set of texture parameters, bi g . The shape parameters are 
scaled and then placed in a vector with the texture parameters. This gives rise to 
a set of vectors, bi, where 

, / Wbi 

bi = 



b 



ig 



and W is a scaling matrix. (The reason why we apply W is simply that shape pa- 
rameters have units of distance whereas texture parameters have units of intensity 



and the two are therefore not directly comparable. (Cootes and Taylor, 2001a) 
suggest defining W = rl where I is the identity matrix and r is the square root of 
the total intensity variation divided by the total shape variation.) 

PCA is applied to the set of vectors, bi, and yields a matrix of eigenvectors, 
A", such that each bi can be expressed as: 



bi = A"c 



23 



c is the vector of appearance parameters. (Notice that there is no mean element 
in this formula; the reason is that the shape and texture parameters both have 
zero mean.) Shapes and textures can then be expressed using the appearance 
parameters. For example, since a texture can be written as: 

g = g + A'b 

and since the texture parameters b can be written as: 

b = AgC 

where A g denotes the rows of the matrix A" that relate to texture (i.e., the bottom 
t rows of the matrix, where t is the number of texture parameters), we have: 

g = g + A' A g c 

Similarly, shapes can be expressed in terms of the appearance parameters, where: 

x = x + AW" 1 A^'c 

where Ag denotes the rows of the matrix A" that relate to shape (and W is the 
scaling matrix from above). 

New, complete images (i.e., with shape and texture) can be generated by the 
AAM by using the texture model to generate a mean shape face with texture, and 
then warping this face to give it shape by using the shape model. Figure 2.14 



is analogous to Figure 2.10 but for an AAM rather than an ASM. Each set of 3 
images shows the effect of a mode of variation. The middle image in each set of 
3 is the mean image (mean shape and mean texture) and the adjacent images are 
the effect of the mode of variation. 




Figure 2.14: The images illustrate 4 of the modes of variation for an AAM trained 



on 400 images of faces. (Image taken from Cootes and Taylor (2001a)) 
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Fitting 



As with ASMs, Active Appearance Models can be fitted to images iteratively (see 



Cootes et al. (1998) for further details). The goodness of fit of a synthesised image 



generated by the model is measured by taking the square of the norm of 51 where 

SI = Ii - I m 

and Ij is the vector of gray-level values for the target image and I m is the vector 
of graydevel values for the synthesised image. 

Given a new target image, fitting is performed as follows. The starting image 
is either the mean image (i.e., mean shape and mean texture) or, if we are fitting 
a video, the fitting from the previous frame is used as the starting image. The 
error vector I; — I m and error value |<5I| 2 are calculated. A suggested adjustment 
to the appearance parameters is then computed. The suggested adjustment is 
derived from information collected from the training data. It turns out that the 
suggested adjustment, 5c, (remember that c denotes the appearance parameters) 
can be linked to the error vector by the relationship: 

5c = A5I 



where A is a matrix that can be found by using the training data (where we know 
the correct appearance parameters) to test different values of 5c (by perturbing 
the appearance parameters by small amounts) and observing the effect on 51. 
The appearance parameters are updated according to the suggested adjustment 
multiplied by a scalar, k, where k = 1. That is, the appearance parameters become 
c — k5c. The error value is then recalculated. If it is less than the original error, 
then we return to the start of iterative process and recalculate the error vector. 
If it is greater than the original error, then we try the adjustment with k = 0.5. 
If the error is now less than the original error, then we return to the start of the 
iterative process. If not, we try k = 0.25. If at this point we find that the error 
is not improved then convergence is deemed to have occurred and the appearance 
parameters are accepted. 

Figure |2.15| shows an AAM being fitted to a face. The left-most image is the 
starting shape and texture, which clearly does not fit well. The next image shows 
the synthesised image after 2 iterations; the next after 8 iterations; then 14 iter- 
ations and then 20 iterations. The right-most shows the image when convergence 
has occurred and fitting is complete. 



2.4.3 Procrustes Analysis 



We saw in Section 2.4.1| that Procrustes analysis is used to align training shapes 
with the mean training shape in one of the first steps of training an ASM. We 
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Figure 2.15: The images show an AAM being fitted to a face. Convergence occurs 



after more than 20 iterations. (Image taken from Cootes et al. (1998)) 



also saw that Procrustes analysis is used again during fitting an ASM to an image. 
Procrustes analysis will also play a crucial role when we develop two measures of 
asymmetry in the next chapter. In this section we prove a theorem that will be 
used in the next chapter (and that is used in implementations of ASMs). 

Procrustes analysis is a set of techniques in statistical shape analysis that is 
concerned with shape comparison and solutions to various "Procrustes problems" 
(the term "Procrustes analysis" dates back to Hurley and Cattell (1962)). Of 



particular relevance to our work is the "Orthogonal Procrustes Problem" which 



was solved by Schnemann (1966). Suppose that we have two sets of 2-dimensional 



points, A and B, with each set containing N points (note that the problem and so- 
lution can be generalised to n-dimensions, but we shall only consider 2 dimensions 
for simplicity). Suppose further that each set of points is normalised so that the 
sum of the squares of their coordinates is 1 and that each set of points is centred on 

N N N N 

£ Xbi = £ Vai = £ vm 



the origin (i.e., £ x 



0. A and B can be represented 



i=l 



i=l 



as matrices: 
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The Orthogonal Procrustes Problem is to find the orthogonal 2 by 2 matrix, R, 
that, when applied to A, minimises the Procrustes distance to B. The Procrustes 
distance between A and B is defined as: 



N 



i=i 



In geometric terms, this amounts to finding the rotation and/or reflection that - 
when applied to the first set of points - minimises the Procrustes distance between 
the 2 sets of points. 
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Theorem. The orthogonal matrix R that minimises the Procrustes distance is 
given by: 

R = UV T 

where 

B T A = VWU T 

is the singular value decomposition of B T A with U and V orthogonal matrices 
and W a diagonal matrix. 



Proof. The proof is based on the proof in (Dryden and Mardia, 1998) with some 
details filled in. 

We want to minimise ||B — AR|| where R is orthogonal (i.e., RR T = R r R = I). 
This is the same as minimising trace((B — AR) T (B — AR)). The equivalence 
follows from the definition of Procrustes distance, and the definition of the trace 
of a matrix as the sum of its diagonal elements. 

Now, we have the following: 

trace((B - AR) T (B - AR)) 
= trace((B T - R T A T )(B - AR)) 

= trace(B T B) - trace(R T A T B + B T AR) + trace (R T A T AR) 
= trace(B T B) - trace((B T AR) T + (B T AR)) + trace((AR) r AR) 

We also have that trace(B T B) = ||B|| = 1 (because the points have been nor- 
malised) 

and that trace((AR) r AR) = ||AR|| = ||A|| = 1 (because R is an isometry). 

Also, trace((B T AR) T ) = trace(B T AR). Thus 

trace{B T B) - trace({B T AR) T + (B T AR)) + trace( (AR) T AR) 
=2 - 2 * trace(B T AR) 

which means that we want to find R such that trace(B T 'AR) is maximised. 

Apply singular value decomposition to B T A to find U, V and W such that 

B T A = VWU T 

with U and V orthogonal matrices and W a diagonal matrix with non-negative 
entries. 
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Then trace(B T AK) = £mce(VWU T R) = trace(XW) where X is set as X = 
VWU T RW 1 and is orthogonal. Write: 



Then trace(XW) = ae + df. Since X is orthogonal, a and d have absolute value 
less than or equal to 1. This means that the maximum trace of XW is e + / which 
is the trace of W. 

Set R = UV T . Then R maximises trace(B T AR) because trace(B T AR) = 
trace(VWU T R) = trace(VWU T UV T ) = trace(VWV T ) = trace(W) 

Thus, we have shown that R = UV T minimises the Procrustes distance between 
A and B and the proof is complete. □ 

Chapter 2 is now concluded. In the next chapter we report on the development 
of two novel measures of facial asymmetry. 
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Chapter 3 



Developing two measures of 
facial asymmetry 

3.1 About this chapter 

In this chapter we describe the design of two measures of asymmetry and the steps 
involved in their development. We start by presenting several software implemen- 
tations of ASMs and AAMs and explaining how we decided which implementation 
to use as the basis of our measures of asymmetry. We test the selected implementa- 
tion by training it for a test video and evaluating its fitting performance. We then 
go on to describe the development of the two measures in detail. The first measure 
relies only on shape data (i.e., coordinates of landmarks of the face) whereas the 
second measure uses both shape data and texture data (where texture data means 
pixel values of all points on the face). After describing the two measures, we com- 
pare their performance on the test video and decide which measure to prefer. We 
end the chapter by discussing the limitations of the preferred measure. 

3.2 Software selection 

The first step was to select a software implementation of either Active Shape Mod- 
els or Active Appearance Models that would be our starting point for developing 
measures of asymmetry. Several implementations were tried and tested, according 
to the following desiderata: 

• Ideally the software would implement AAMs to allow the possibility of using 
full appearance information (shape and texture) should appearance informa- 
tion prove to enable a better measure of asymmetry than shape information 
alone 
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• The source code should be available to allow customisation of the software 
and complete understanding of its inner workings 

• The fitting should be accurate, reliable and capable of operating with enough 
speed to model videos of faces as well as single images of faces 

Seven different software implementations were considered and four of these were 
tested. Table 3.1 summarises information about each of these implementations. 
Options 2, 5 and 6 were rejected because the source code could not be obtained. 
Option 1, which was in MATLAB, was capable of fitting shapes but was too slow 
to be used for videos. Options 3 and 4 worked well but were limited to ASMs and 
so were ultimately rejected. 





Author 


Language 


ASM 
or 

AAM? 


Source 
Avail- 
able? 


Verdict 


1 


Ghassan 
Hamarneh 


MATLAB 


ASM 


Yes 


Too limited. No facility for tracking 
videos, without extra coding. Very 
slow tracking. 


2 


Tim 
Cootes 


C++ 


AAM 


No 


Source code was not available so this 
option was ruled out. 


3 


Stephen 
Milborrow 


C++ 


ASM 


Yes 


Worked well but ultimately rejected 
in favour of option 7 which was more 
versatile. 


4 


Yao Wei 


C++ 


ASM 


Yes 


By the same author as option 7, but 
limited to ASMs whereas 7 imple- 
ments AAMs as well. 


5 


Mikkel B. 
Stegmann 


C++ 


AAM 


No 


Although C++ source was meant to 
be available for download, it could 
not be obtained. 


6 


George 
Papan- 
drcou 


MATLAB 
and C++ 


AAM 


No 


Author did not respond to email re- 
questing source code. 


7 


Yao Wei 


C++ 


AAM 


Yes 


Selected this. Fitted images and 
videos. Performed both training and 
fitting. 



Table 3.1: Candidate software implementations of ASMs and AAMs (the associ- 
ated URLs can be found in Appendix \Ay . 



The AAM library by Yao Wei (number 7 in Table 3.1) was selected because 
it met all of the desiderata: the source code was available; it allowed access to 
both appearance information and shape information; it allowed both training and 
fitting of models; it was much faster than the MATLAB implementation that was 
tested (option 1); and it was recommended by two PhD students at the University 
of Bristol who had used it in their research (Alexander Davies and Xiao Wang). 
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The library was compiled using the Eclipse Integrated Development Environ- 
ment in Ubuntu 11.04. It was necessary to compile OpenCV (Open Source Com- 
puter Vision Library) beforehand. Version 2.2 of OpenCV was used. 

3.3 Testing the A AM library 

The first test of the AAM library was to train it on a face in a test video, and see 
if it would track the face reliably. The video used was provided by the Psychology 
Department at the University of Bristol. A three-minute segment was extracted 
from the video using a tool called ffmpeg. The segment was chosen to include 
a variety of facial expressions; the subject is seen talking, smiling and laughing 
and there is some head movement and rotation. To increase speed of fitting, 
the resolution was reduced using ffmpeg to 640 by 360 pixels from the original 
resolution of 1920 by 1080 pixels. 

3.3.1 Training 

To train the AAM it was necessary to select several frames from the video that 
represented a range of expression and would allow the model to fit all frames in the 
three- minute segment. This could have been done by manually selecting frames 
on the basis of visual inspection, but an alternative methodology was adopted. To 
select the first cohort of frames a small C++ program was written that selected 
the frames using the principle of a polyline algorithm. A polyline algorithm is 
essentially a method for simplifying a polyline. A three-minute segment of video - 
at 25 frames per second - consists of 4,500 frames. Each frame is an image of 640 
by 360 pixels, and each pixel has three integer values; one for the red value, one 
for the green value and one for the blue value. Each frame can thus be represented 
as a 691,200 ( = 640 * 360 * 3) dimensional vector, with each value in the vector 
between and 255. A three-minute video can therefore be thought of as a set of 
4,500 points in 691,200-dimensional space. If these points are connected, then the 
result is a polyline. The polyline algorithm simplifies the polyline by selecting a 
number of vertices that represent the "significant features" of the polyline. 

The implementation of the polyline algorithm is straightforward. Choose an 
integer value called the "Tolerance". The higher the tolerance, the fewer points 
that will be selected by the algorithm (i.e., the fewer frames that will be selected 
from the video). The tolerance value has to be chosen by trial and error. Pick 
an arbitrary value and run the program and see how many points it returns. If 
too few, then decrease the tolerance and run again; if too many, then increase 
the tolerance and run again. The program loads the first image from the video 
and sets it as the base image. It then steps through the video frame by frame, 
each time calculating the Euclidean distance between the base image and the given 
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frame (recall that each frame is a 691,200 dimensional vector, so we can calculate 
Euclidean distances between frames). If the distance is greater than the tolerance 
then it saves a copy of the frame, and sets the frame as the base image. If the 
distance is less than the tolerance then the frame is not saved and the base image 
does not change. 

Using this method, five frames were selected from the video. These five were 
annotated, each with 50 landmark points, marking the significant features of the 
face. Annotation was done using a free tool built by Professor Tim Cootes available 
on his website (the URL can be found in Appendix |A[) . The tool takes an image 



file, and allows the user to mark landmarks by hand (Figure 3.1 shows the tool in 



action). The landmarks are then saved in a .pts file, which is a text file that stores 



the x and y coordinates for each landmark. Figure 3.2 shows one of the annotated 
frames. 




Figure 3.1: Annotating an image using the markup tool written by Professor Tim 
Cootes and available on his website. 

After annotating five images, the model could be built using the AAM library. 
To do this, five images and five corresponding .pts files (containing the coordinates 
of the landmarks) are needed. These are placed in a folder that the AAM library 
can access, and after the model builder is run, a model file is created. This model 
file is used in the fitting stage. 



3.3.2 Fitting 

The AAM library's fitting program takes a video and model file as input, and 
steps through the video frame by frame attempting to fit the face in each frame. 
It outputs a video of the synthesised face built by the model. In Figure 3.3 an 
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Figure 3.2: Example of a frame with 50 landmarks marked, 
example frame is shown. The left-most image shows the original frame, and the 




Original frame Attempted fitting of Fitting overlaying the 

frame original frame 



Figure 3.3: Example of fitting a frame with a model built from 5 training images. 

middle image shows the attempt to fit the frame using a model created with 5 
training images (each with 50 landmarks annotated) . The right-most image shows 
the fitting overlaying the original frame. To measure the performance of the fitting, 
an OpenCV function calculating the LI norm between the original frame and the 



fitted frame (i.e., left-most and right-most images in Figure 3.3) was used. As we 
have already seen, images can be expressed as vectors. If iml and im2 are two 
colour images (each of resolution 640 by 360 pixels) then we can write them both 
as 691,200-dimensional vectors: 

iml = (ri, 0i, &i, ...,r230400> 5230400, &23040o) 

and 

im2 = (r[,g[, b[, .••,^230400' 0230400, ^230400) 
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The LI norm for iml and iml is then defined as: 



230400 



+ \9i ~9i\ + \h - b'i\ 



The LI norm, then, is simply the sum of absolute differences between corresponding 
pixel values across the two images. The lower the LI norm between the original 
frame and the fitted frame, the better the fitting performance. 

Calculating the LI norm for each frame meant that the performance for one 
frame could be compared to the performance for another, and also that the per- 
formance of one model could be compared to the performance of another model, 
by looking at each model's performance averaged across all frames. The frame 



displayed in Figure 3.3 represents a fitting with average performance (i.e., the me- 
dian performance for all frames in that video). It can be seen that the fitting is 
not particularly good. The colours do not look right and the shape of the mouth is 
wrong. In addition, the eyes point in a different direction to their direction in the 
original frame. However, since this model was the result of only 5 training images, 
there is scope to improve the model, as we shall see. 

Two modifications were made to the AAM library's fitting code. One was to 
add a function that produced the frame with the fitting overlaying the original 



frame (for example, the right-most image in Figure 3.3). This was achieved using 
OpenCV functions. The second modification was to add a colour correction func- 
tion, which significantly improved the colour of the fitting. The colour correction 
function starts by considering the red colour channel. It calculates the average 
red pixel value across the original frame and subtracts the average red pixel value 
across the fitted frame. The result is then added to each red pixel value of the fitted 
frame (if this results in a red pixel value of less than 0, then is chosen; if it results 
in a red pixel value of greater than 255, then the value is set to 255). The same is 
then done for the green and blue colour channels. The function made no noticeable 
difference to speed of fitting and substantially improved fitting performance. 

We expected that increasing the number of training images would improve 



fitting performance, but according to diminishing returns (Gross et al. (2005) 
provided support for this expectation). Having created a model from 5 training 
images, and having attempted to fit this model to the test video, it was possible to 
see which frame fitted least well by looking at the LI norm between each original 
frame and its corresponding fitted frame. The frame which was fitted least well 
was annotated and added to the original 5 frames to build a model with 6 images. 
This model was then fitted to the video; its average performance was recorded 
(i.e., average LI norm across all frames); and the worst performing frame was used 
to create a model with 7 training images. This process was repeated until a model 
had been built with 20 training images. This allowed us to plot a chart indicating 
the impact on fitting performance of the number of training images, for the test 
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video. Figure 3.4 shows average fitting error (i.e., the average LI norm between 



each original frame and its corresponding fitted frame for the video) against the 
number of training images used. In all cases, 50 landmarks were used in the training 
set. It can be seen that earlier additions to the number of training images make 
more substantial improvements to performance than later additions; for example, 
increasing the number of training images from 5 to 6 has more of an impact on 
performance than increasing the number from 19 to 20. A trend line has been 
plotted for the points. We can see from the trend line that 20 training images is 
by no means the optimum number of images for this test video, but for the present 
purpose of establishing that AAM library can achieve a good fitting, we shall see 
that it is enough. 




7 9 11 13 15 

Number of images in training set 



Figure 3.4: The chart shows average fitting error across frames on the y-axis 
against number of training images used to build the model on the x-axis. A trend 
line has been added. 



Figure 3.5 shows the performance of the model created using 20 training images 
for two frames. The first frame is the frame whose fitting error is the median value 
of the set of all fitting errors for the frames in the video. It can therefore be viewed 
as representative of the average performance of the model for the test video. The 
second frame has a fitting error in the 95th percentile. In other words, 95% of 
frames have fitting error less than or equal to the error for the second frame. 



Inspection by eye of the fitted frames in Figure 3.5 shows that the result 
achieved by using 20 training images is a good one. On the top row (median 
error), all facial features are faithfully represented. The only noticeable difference 
between the original face and the synthesised face is a lack of sharpness in the 
latter. In addition, more lower teeth are exposed in the synthesised face than the 
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95th 

percentile 
error 




Fitted 



Figure 3.5: The fitting performance - for two frames - of the model created using 
20 training images. 



original, but this is only a subtle difference. Since 50% of frames were fitted better 
than this frame, it was clear that the AAM was performing well. Also notice that 
the colours of the synthesised model have been greatly improved in comparison to 



Figure 3.3 This was a result of the colour correction function. On the bottom row 



we can see a frame that is fitted less well, but the result is still satisfactory. Again 
there is a lack of sharpness in the synthesised image and the shape of the mouth 
appears slightly different. In addition, the eyes in the synthesised image appear 
to be directed slightly differently. However, all in all, performance in the bottom 
row is also good, and 95% of frames in the test video showed better performance. 

By this stage it was deemed that the AAM library was sufficient for the purpose 
of building a measure of asymmetry, because it was able to accurately model faces 
in videos and locate facial features. The next step was to build on the library a 
means of measuring facial asymmetry. 



3.4 Two measures of facial asymmetry 

It turned out that two measures of asymmetry were developed. The first measure 
was simpler because it only used shape information to measure asymmetry. The 
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second measure built on the first measure by using texture information in addition 
to shape information. In this section we describe the two measures, compare the 
two and explain why the first was ultimately preferred to the second. 



3.4.1 Measure One: Using shape information only 

To develop the first measure of facial asymmetry, we started by creating a short 
test video with the following characteristics: 

• First, the face was pointed directly to the camera, with minimal left or right 
rotation. The reason for this was that left or right rotation would distort a 



2-dimensional measure of asymmetry (we show why this is in Section 3.5.2). 



• Second, the video showed the face with neutral expression punctuated by 
asymmetric expressions such as raising one eyebrow or raising one side of 
the mouth. 

The AAM was trained in the usual way and training images were selected 
as above. That is, first a polyline algorithm was used to select 5 images, and 
then further images were added to improve tracking where necessary. Once a 
satisfactory fitting of the test video was achieved, we could begin to develop the 
measure. In the training phase we had used 68 landmarks on each image (see 
Figure 3.6 for an example of an annotated training image). The AAM library 




Figure 3.6: An annotated training image from the test video. 

stored shape data during fitting in a matrix. This meant that we could access 
this matrix for each fitted frame and extract the x and y coordinates for the 68 
points. If the fitting was accurate then these coordinates would represent the same 
landmarks of the face that were annotated in the training phase. We could then 
use the coordinates of these points to measure the degree of asymmetry of the face 
in the given frame using Procrustes analysis. In the following we break down the 
steps required to perform this calculation on a given frame. 



37 



Step One 

Given a frame, extract the shape data for the fitting and separate the 68 points 
into those that belong to the left side of the face and those that belong to the right 
side. To separate the points we calculate the median value of the x-coordinates 
of the 68 points. Points with x-coordinate less than the median are deemed to be 
for the left side of the face; points with x-coordinate greater than the median are 
deemed to for the right side of the face. Separating the points in this way results in 
two subsets of points, each with 34 points. (Using the median of the x-coordinates 
for the separation relies on the face being relatively upright in each frame; this 
assumption was reasonable because of the way we opted to film the videos. See 



the later section on pose variation (3.5.2) for more detail.) 



Figure 3.7 shows three images for a frame of the test video. The left-most is 
the frame itself. The middle shows the fitted points overlaying the frame, and the 
right-most frame shows the same points separated into left-side points (coloured 
green) and right-side points (coloured blue). 



Figure 3.7: Plotting shape data for a fitted frame and separating according to side 
of face. 



Step Two 

Once we have the two sets of points representing each side of the face, and they 
are in suitably dimensioned matrices (i.e., 2 columns by 34 rows, for the (x,y) 
coordinates of each of the 34 points) we can perform Procrustes analysis to deter- 
mine the rotation, reflection and translation that brings the right-side set of points 
closest to the left-side set, where the notion of distance is Procrustes distance. Pro- 
crustes analysis is applied to the two matrices containing the coordinates of the 
points. There are a number of steps, which we now detail. Functions were written 
to perform these steps in C, using OpenCV matrix manipulation functions where 
appropriate. 

• First, both matrices are centred on the origin by calculating their centroids 
and translating by the negative of their respective centroids (the centroid 
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of a set of n points, (xi,yi), means the point (x,y) where x = ^ x i an d 

8=1 

n 

y = h^Z Ui are the means of the set of x-coordinates and y-coordinates 

n 8=1 

respectively) . 

(xi,yi) -> (a; f -x,y f - y ) 

• Second, both matrices are normalised so that the sum of the squares of their 
values is 1. 

{Xi,yi) -»■ ( — , — ) 

n l 
where C = (^2 x^ + y; 2 )5 

i=l 

• Third, the first matrix (that represents the left-side set of points) is trans- 
posed and multiplied by the second matrix (that represents the right-side 
set of points). Singular value decomposition is applied to the resulting 2 by 
2 matrix, to decompose the matrix into a product of three 2 by 2 matrices; 
call these matrices L, D and M T where L and M are orthogonal matrices 
and D is the diagonal matrix. 

• Fourth, the orthogonal matrix that represents the rotation and reflection 
that minimises Procrustes distance (call it T) is found by taking the product 
of M with the transpose of L. I.e., T = ML T . We have proved that T is 



the minimising matrix in Section 2.4.3 



• Fifth, the translation component of the transformation that minimises the 
Procrustes distance is found by applying T to the original centroid of the 
right-side set of points and subtracting the result from the original centroid 
of the left-side set of points (the centroids were calculated in the first step). 

Step Three 

Given a frame, and the reflection, rotation and translation that minimises the 
Procrustes distance, we apply the transformation to the right-side set of points 
and measure the Procrustes distance between the resulting set of points and the 
left-side set of points. The distance is our measure of asymmetry for the frame. 
The larger the distance, the larger the asymmetry. In this way we were able to 
develop a program that - when given a video of a face and a model trained for 
that face - could put a frame-by-frame measure of facial asymmetry on the video. 



In Figure 3.8 we show this measure at work on a neutral and relatively symmetric 
face. We can see that after rotating, reflecting and translating the right-side set of 
points they are fairly close to the left-side set of points. The Procrustes distance 
for this frame was found to be 20.52 (to two decimal places). 
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Figure 3.8: The top row shows the frame with left-side and right-side points fitted. 
The second row simply shows these points on a black background (the vertical 
white line in the centre of this row represents the y-axis (i.e., the line x = 0)). The 
third row shows the left-side points and the right-side points after the latter have 
had the rotation and reflection applied. The fourth row shows the left-side points 
and the right-side points after the rotation, reflection and translation have been 
applied. The Procrustes distance between corresponding points in the fourth row 
is the measure of asymmetry for the frame. It was found to be 20.52. 



In Figure 3.9 we show the measure at work on a deliberately asymmetric face. 



The Procrustes distance for this frame was found to be 94.30 (to two decimal 
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places). 




3.4.2 Measure Two: Using shape and texture information 

The second measure of asymmetry developed used both shape and texture infor- 
mation. The idea here was, given a frame, fit the frame using the AAM (so we 
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would have both shape and texture information); determine an axis of symmetry 
for the fitting; reflect the texture information from one side of the face across to 
the other side of the face; measure the average absolute difference in corresponding 
pixel values between the reflected side and the unreflected side. 

The first step was to decide how to determine an axis of symmetry for each 
frame. It was decided to take the following approach. First, extract the shape data 
and centre the points on the origin by calculating their centroid and subtracting 
the centroid from each pair of coordinates. Second, split the points into those 
for the left-side and those for the right-side (as with the first measure). Third, 
apply Procrustes analysis to determine the transformation that, when applied to 
the right-side points, minimised the Procrustes distance to the left-side points. 
As already seen, Procrustes analysis yields an orthogonal matrix representing the 
rotation and reflection. We can use this matrix to infer the axis of symmetry. The 
matrix has the form: 

—cosO sinO 
sinO cosO 

This is equivalent to reflecting in the straight line y = mx where 

cos | 
m = 7T 



Figure 3.10 shows a frame from the test video with the shape data overlaying the 
image and with the axis of symmetry (calculated in the way described) drawn as 
a white line. 




Figure 3.10: A fitted frame from the test video with left-side and right-side points 
marked, and the axis of symmetry drawn in white. 



The second step was to split the texture information according to the axis of 
symmetry. This provides a fit to the left side and a separate fit to the right side. 



Figure 3.11 indicates what this split looks like. The third step is to reflect one side 
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Figure 3.11: The top row shows the fitting provided by the AAM model. The 
second row shows it split into left and right sections according to the calculated 
axis of symmetry. 



of the fitting across the axis of symmetry. Without loss of generality, the right 
side was reflected. To perform the reflection, we used the fact that reflection of a 
point (x', y') in a line y = mx takes the point to: 

2my' — (to 2 — l)x' (to 2 — l)y' + 2mx' 
1 + to 2 1 + TO 2 

(See Appendix [B] for the proof.) 

The final step is to convert both the left side and the reflected right side to 
grayscale and to normalise them with respect to average pixel value. The reason we 
do this is to reduce the likelihood of error in our measure of asymmetry caused by 
one side of the face being more illuminated than the other due to uneven lighting 
during recording. (To perform the normalisation we increment (or decrement) 
the pixel values of the reflected right side by a constant so that its average pixel 
value is the same as the average pixel value of the left side.) Finally, we crop 
each image so that their shapes are identical, keeping as much of each image as 



possible. The result of these operations is shown in Figure 3.12| The mean absolute 



difference between corresponding pixel values for the two grayscale images is 18.38 



and the median is 16. In Figure 3.13 we apply the same measure to a deliberately 
asymmetric face. The mean absolute difference in pixel value in this image is 
24.82 and the median is 20. This agrees with our expectation that the higher the 
asymmetry the higher the average absolute difference between corresponding pixel 
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Figure 3.12: The left image shows the left side of the face, unadjusted. The right 
image shows the right side of the face, reflected in the axis of symmetry. Below 
the images is a histogram showing the distribution of absolute differences between 
corresponding pixel values for the two images. The mean absolute difference is 
18.38 and the median is 16. Our measure of asymmetry hypothesises that the 
higher these numbers, the greater the facial asymmetry. 

values for the unreflected left side and the reflected right side. 

The second measure can be subdivided into two distinct but closely related 
measures; we can either use the mean absolute difference in pixel values or the 
median. The reason why we may want to use the median rather than the mean is 
that the mean - as a measure of the average of a distribution - can be skewed if the 
distribution is skewed. If, for example, there are a minority of pairs of pixels where 
the absolute difference is very high, the mean absolute difference may be high even 
though the two grayscale images are similar. To see if using the median produced 
a significantly different measure to using the mean, we used both methods to 
record the degree of asymmetry for each of the 1,250 frames of the test video. 
We then calculated the Pearson correlation coefficient between the mean absolute 
pixel difference and the median absolute pixel difference. (The Pearson correlation 
coefficient between two variables is their covariance divided by the product of their 
standard deviations - the maximum possible value is 1 and the minimum possible 
value is -1). The Pearson correlation turned out to be 0.93 indicating a strong 
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Figure 3.13: The top row shows a deliberately asymmetric face with the axis of 
symmetry drawn. On the second row we see the left and right fittings of the face. 
On the third row we see the left side of the face, and the right side reflected across 
the axis, both in grayscale. The histogram shows the distribution of absolute 
differences between corresponding pixel values for the two grayscale images. The 
mean absolute difference is 24.82 and the median is 20. Comparing this histogram 
to the histogram in Figure 3.12 reveals a longer tail, indicative of greater difference 
between the grayscale images. 
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degree of dependence between the variables. We also plotted both variables against 
frame number and found that there was no interesting difference between using 
the median and using the mean. 



3.4.3 Comparing the measures 



We now have two different measures of facial asymmetry. The first was described 
in Section 3.4.1 and relies only on shape information; call this "Measure One" . The 



second was described in Section [3.4.2| and uses both shape and texture information 
to find the mean absolute difference between pixel values for one side of the face 
and the other; call this "Measure Two". To compare these two measures we 
performed both on all 1,250 frames of the test video and plotted the results in 



Figure 3.14 The magenta line shows asymmetry by frame number according to 
Measure One and the blue line shows asymmetry by frame number according to 
Measure Two. The Pearson correlation between the two measures was calculated 
to be 0.90 indicating a high degree of dependence between the variables. 
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Figure 3.14: Chart showing measured facial asymmetry on the y-axis against frame 
number on the x-axis for the test video. The magenta line is for Measure One and 
the blue line is for Measure Two. 

On the graph we have displayed four frames at points of interest. The first 
frame (left-most) is a neutral expression. Both measures record a relatively low 
level of asymmetry. The second frame is a neutral expression with the eyebrows 
raised and the eyes closed. Both measures record a small jump in asymmetry at 
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this point, which, as we shall see again in Section 4.2.2, is due to the left eyebrow 
arching more than the right and the left eyelid lowering more than the right during 
blinking. The third frame is a deliberately asymmetric frame where the left corner 
of the mouth is deliberately raised. Both measures record a jump in asymmetry, 
but the jump is much greater for Measure One. Measure Two records a level of 
asymmetry no greater than the level that it records for the second frame, which 
seems wrong. The fourth frame is another deliberately asymmetric frame where 
the left eyebrow is raised. Again, both measures record an increase in asymmetry, 
but the jump is greater for the first measure. 

On the basis of this video and other considerations we decided to prefer Measure 
One and discard Measure Two. The reasons for this were that: 



Although the two measures were highly correlated for the test video (with 
a Pearson correlation coefficient of 0.90) Measure One performed better and 
more closely matched our expectations than Measure Two. In particular, 
Measure One correctly identifies that frames 3 and 4 display greater asym- 
metry than frame 2, whereas Measure Two records frame 2 as the most 
asymmetric point of the video. 

• Measure One is customisable because it allows the user to choose which parts 
of the face should figure in the measure of asymmetry. To include a part of the 
face in the measure, the user simply needs to annotate it during the training 
phase. Furthermore, if the user wants certain parts of the face to weigh more 
in the measure of asymmetry, then the user simply has to add extra training 
points to those parts of the face. Measure Two is not customisable which 
means that the user cannot choose to exclude any asymmetry of pixel values 
from the measure. For example, if the subject looks to one side this will 
create asymmetry that Measure Two cannot ignore. Measure One can either 
include or ignore the asymmetry depending on whether the user annotates 
the pupils of the eyes in the training stage. 

• Measure Two can be affected by lighting variation during recording of the 
video whereas Measure One cannot. 

For these reasons, we discard Measure Two at this point. In the next section we 
look at the limitations of Measure One. 



3.5 Limitations of the measure 
3.5.1 Fitting noise 

There are two main limitations of our measure (i.e., Measure One). The first is 
that it is only as good as the shape data provided by the fitting. If the shape data 
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do not correctly locate the points of the face that they are meant to locate, then 
the measure of asymmetry will be incorrect. There are two methods that we can 
use to check that the fitting reaches a certain standard. However, since both of 
these methods can only show that the fitting is accurate up to a certain point, our 
measure of asymmetry necessarily has room for a small degree of error. 

The first way to verify that the fitting reaches a certain standard is to watch 
the video with the shape data overlaying the frames. If points are incorrectly 
located then the viewer will typically spot this. Experimentation showed that the 
video can be watched back at double speed and the viewer can still easily identify 
when points are incorrectly located (and can respond by adding extra images to 
the training set). However, checking the shape data by eye necessarily limits the 
degree of accuracy that we can be sure of achieving (and is time-consuming). The 
viewer is unlikely to spot that a point is out of place by just two or three pixels. 
And a viewer is more likely to miss a misplaced point if they need to check a lot 
of frames. 



Recall that in Section 3.3.2 we showed that we can measure the fitting error 
for a given frame by taking the LI norm of the difference between the original 
frame and the fitted frame. We would like an analogous way of automatically 
measuring the fitting error of the shape data alone. However, since we have no 
way of accessing the shape data, independently of using the fitting, we must rely 
on checking the fitting by eye. 

The second way that we can verify that the fitting reaches a certain standard 
is as follows. Remember that the first measure involves rotating, reflecting and 
translating the right-side set of points to minimise the Procrustes distance to the 
left-side set of points. If there are 34 points in each side's set of points then we 
can record the 34 distances between corresponding points for each frame. We 
can then sort these distance (for all frames) and look at the frames that have the 
largest distances. The reason is that the largest distances could be due to fitting 
errors and so they should be checked. If the largest distances are due to facial 
asymmetry rather than fitting error, then we can be confident that there are no 
other substantial fitting errors in the video. This process was completed for the 
test video, and we found that the largest distances were due to asymmetry and 
not fitting error. 

In summary, our measure of asymmetry is only as good as the accuracy of the 
fitted shape data, and we can only show that the fitting is accurate up to a certain 
point, so our measure of asymmetry necessarily has room for a small degree of 
error. 

3.5.2 Pose variation 

The second limitation of our measure is that rotational, lateral movement of the 
head will affect the measure of facial asymmetry even though the facial expression is 
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constant and so the actual level of asymmetry is unchanged. Figure 3.15 illustrates 



this point. The left-most image shows the head rotated to the right by a small 
angle and the right-most image shows it rotated to the left by a small angle. 
The centre image has minimum rotation and is intended to be straight on. The 
measure of asymmetry for each image is calculated. Since the facial expression is 
neutral in all three images, we would wish the measure of asymmetry to be the 
same in all three images. However, the left-most image is recorded as the most 
asymmetric (with a score of 31.57) and the right-most image is recorded as the 
second most asymmetric (with a score of 24.20). The score for the middle image 
- which represents the actual degree of asymmetry - is 19.19. The reason why 
our measure is affected by lateral rotation is that this kind of rotation makes the 
distance between a point on the face and the camera dependent on the point's x- 
coordinate (if we assume that the x-axis is perpendicular to the axis of symmetry), 
and this in turn means that features appear larger or smaller according to the side 
of the face they lie on and their distance from the axis of symmetry. 




Figure 3.15: Three images illustrating that our measure of asymmetry is affected 
by lateral head rotation. Below the images we can see the landmarks from the left- 
side of the face and the landmarks from the right-side of the face after reflection, 
rotation and translation. Recall that Procrustes distance between corresponding 
points is our measure of asymmetry. The Procrustes distance for the left image is 
31.57; the Procrustes distance for the middle image is 19.19; and the Procrustes 
distance for the right image is 24.20. 

In response to this limitation, videos must be filmed with minimal lateral ro- 
tation of the head, and with the head pointing directly towards the camera. To 
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eliminate pose variation, the camera was attached to the head so that its posi- 
tion relative to the head was held constant. Unfortunately, it is not possible to 
have the camera perfectly straight on, as it may always be misaligned by a matter 
of millimetres. However, it is possible to have a constant alignment using this 
technique, and, since we are most interested in relative changes of asymmetry 
as emotional expression changes, rather than an absolute measure of asymmetry, 
constant alignment is more important than perfect alignment. 

We undertook some research into overcoming this limitation by attempting 
to manipulate the image of a laterally rotated head to eliminate the effect of the 
rotation on our measure of asymmetry. To do this we used OpenCV's FindHomog- 
raphy and WarpPerspective functions. Imagine that the face can be represented 
by a plane. Lateral rotation of the face then amounts to lateral rotation of this 
plane. The FindHomography function is used to compute the homography matrix 
that represents the rotation of the plane. To use the function, we need to provide 
it with a set of coordinates for points on the plane prior to the rotation, and with 
a set of coordinates for the corresponding points after the rotation. In the case of 
rotation of the face, we choose a set of points on the face that are closest to lying 
in a plane. The function needs a minimum of 4 points, but more points are more 
likely to produce a good result. 

Once we have computed the homography matrix we can use the WarpPer- 
spective function to apply the perspective transform represented by the matrix to 
an image. The WarpPerspective function takes the homography matrix as input, 
along with the source image and the destination image. If we wish to attempt to 
eliminate the effect of the rotation on the face then we should use the inverse of 
the homography matrix rather than the homography matrix itself. 

To test the approach we calculated the homography matrix for a set of points 
on the middle image in Figure 3.15 and the corresponding set of points for the left- 
most image in the figure. This provided the homography matrix that estimated the 
transformation taking the middle image to the left-most image. We then inverted 
the matrix so that we had the transformation taking the left-most image to the 
middle image; in other words, the transformation to neutralise the rotation we see 
in the left-most image. Figure 3.16 shows the left-most image on the left - before 
we apply the inverted homography matrix - and then the image after we apply 
the inverted homography matrix (using the WarpPerspective function). Whilst 
it appears that the face has been somewhat "rotated" back towards the camera, 
calculating the asymmetry for both of the images in the figure reveals that the 
level of asymmetry is only negligibly reduced. If the method had performed as 
desired, the level of asymmetry after the inverted homography matrix had been 
applied would be significantly lower than before the application, and would be 
close to the measure recorded for the central image in Figure [3. 15| 

Calculating the homography matrix by using points on the face will always be 
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Figure 3.16: The left image shows the slightly rotated head before the WarpPer- 
spective function has been applied and the right image shows the image after the 
function has been applied. Asymmetry only slightly decreases; in the left image it 
is 31.57 and in the right image it is 30.15. 

an approximation because the human face is not a planar surface. However, it was 
hoped that by experimenting with different selections of points we could find an 
approximation good enough to allow our measure of asymmetry to be relatively 
unaffected by lateral rotation. We were unable to achieve this and decided instead 
to rely on fixed camera alignment by attaching the camera to the subject's head. 
A possible avenue of future research is to enhance our measure of asymmetry so 
that it is capable of coping with a range of pose variation. 

This completes Chapter |3j In the next chapter we use Measure One to inves- 
tigate the relationship between facial asymmetry and happiness. 
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Chapter 4 

Applications of the measure 



4.1 About this chapter 

In the previous chapter we developed two measures of asymmetry, compared them, 
and decided that the first - which relies only on shape data - was preferable. We 
also described the measure's two main limitations; the first is that it is only as good 
as the fitting and the second is that lateral pose variation will affect the measure. 
However, if these two limitations are negated by ensuring that the fitting is good 
and that pose is constant we have an automatic frame-by-frame measure of facial 
asymmetry. Once the model is trained on a video it can calculate the level of 
asymmetry in any frame without relying on human input to annotate the frame or 
make measurements (this is in contrast to many early studies of asymmetry and 
emotion where humans were needed to measure the asymmetry in every image 
individually) . This means that complete videos can be analysed and large datasets 
collected. In this chapter we collect some data and look at ways we can analyse 
them. 

4.2 Facial asymmetry and happiness 

Due to time constraints, a single emotion was selected, and the relation between 
that emotion and facial asymmetry was studied in depth. It was decided that it 
would be more interesting to look at one emotion in detail rather than look at 
several emotions superficially. The emotion we selected was happiness; the reason 
was that it is easier to naturally elicit than, for example, fear or sadness. The work 
done on happiness could be generalised to other emotions providing we have videos 
of subjects experiencing those emotions, with constant pose and a good fitting. 
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4.2.1 Fitting 



A ten-minute video was taken of the author viewing a comedy program on his 
computer. This elicited plenty of smiling, smirking and laughing and the video 
was devoid of negative emotions such anxiety or sadness. To train the model we 
first selected ten frames using a polyline algorithm (we explain how selection with 
a polyline algorithm works in Section 3.3.1). The model was then built and fitted 
to the video. To establish the quality of the fitting, the video was watched back 
with the shape data overlaying the video. Frames where the shape data were not 
good were noted (see Figure 4.1 for an example of a frame with poor fitting). A 
selection of these frames was added to the training set and the model was retrained 
and again fitted to the video. This process was repeated until it was decided that 
the fit was good enough for analysis of the video. This point was reached when 
the training set contained 25 training images. At this point we randomly sampled 
100 frames from the video to evaluate the quality of the fitting by eye. 93% were 
fitted perfectly and the other 7% were fitted very well (i.e., they had no more than 
2 points out of place, and these points were out of place by no more than a few 
pixels). None of the frames in the sample of 100 were fitted poorly 




Figure 4.1: The image shows an example of a frame where the shape data poorly 
fit the face because not enough training images have been used. We later added 
this frame to the training set so that the model was able to correctly fit faces where 
the lips were concealed (such as in this frame). 



4.2.2 Analysing the video 



The video was recorded at 15 frames per second. We selected a 6,139-frame seg- 
ment (amounting to just under 7 minutes) where an interesting range of facial 
expressions were made. We then used the fitting and our measure of asymmetry 
(described in Section 3.4.1) to plot the graph shown in Figure 4.2 The graph 
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shows facial asymmetry by frame for the 7-minute segment of video. Seven frames 
have been added to the graph at interesting points and we now discuss these. 

• Image 1 illustrates that blinking has a small effect on our measure of asym- 
metry, increasing asymmetry slightly. Inspection revealed that this was due 
to the corner of the subject's left eye lowering slightly more than the corner 
of his right eye during blinking. Investigation of individual frames revealed 
that many of the small spikes on the chart were due to blinking. 

• Image 2 marks the point where asymmetry first exceeds a value of 25. This 
was the result of a broad smile. Inspection revealed that the subject's smile 
was more asymmetric than his neutral face because his mouth pulled slightly 
more to the left side of his face than the right during smiling, and his right 
eye shrunk more than his life eye. Further peaks of around 25 (see images 3 
and 7 on the graph) were also caused by broad smiles. 

• Image 4 marks the point of greatest facial asymmetry caused by a strong 
laugh. Inspection of the laugh revealed the points of asymmetry; the sub- 
ject's left eyebrow arched more than his right and the left side of the lower lip 
had stronger curvature than the right side. As with his smile, the subject's 
right eye closed a little more than his left. 

• Interestingly, the three points where facial asymmetry dropped below 15 (and 
where the face was therefore most symmetrical) were due to pursing of the 
lips, which compacted and rounded the lips, as illustrated in Image 5. 

• Raising eyebrows typically increased asymmetry due to the left eyebrow rais- 
ing and arching more than the right. Image 6 is an example of this. 



4.3 Measuring left-sided, right-sided and overall facial 
movement 



Figure 4.2 suggests that smiles and laughter are associated with increased facial 
asymmetry. This can be extended to a more general hypothesis that the degree of 
asymmetry increases as the strength of the "happiness expression" increases (for 
example, a slight smile is more asymmetric than a neutral face; a strong smile 
is more asymmetric than a slight smile; and laughter is more asymmetric than 
a strong smile). To further investigate this hypothesis, we needed a measure of 
the degree of happiness expressed. To achieve this, we designated a frame in the 



opening part of the video as the "neutral frame" (the frame is shown in Figure 4.3). 



We decided to measure "degree of happiness" as overall movement away from this 
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neutral face. This approach is only valid if the video only contains either neutral or 
happy expressions. If the video also contained - for example - fearful expressions, 
then we could not simply identify the level of movement from the neutral face with 
the degree of happiness expressed. 




Figure 4.3: The image shows the frame designated as the neutral frame of the 
video. Overall movement (and left-sided and right-sided movement) was measured 
as movement from this frame. 

To measure movement from the neutral face we measured left-sided and right- 
sided movement from the neutral face separately and used the average as a measure 
of overall movement. The reason that we did this was to connect our work with 
topics in neuropsychology discussed in Chapter 2. Recall that the right hemisphere 
hypothesis conjectures that the right side of the cerebral hemisphere is responsible 
for processing both positive and negative emotions whereas the valence hypothesis 
conjectures that the right side is only responsible for negative emotions and the 
left side is responsible for positive emotions. Recall also that the right hemisphere 
hypothesis predicts greater movement on the left side of the face than on the right 
during the expression of emotions whereas the valence hypothesis predicts greater 
movement on the left side for negative emotions and greater movement on the 
right for positive emotions. To see if our video confirms either hypothesis, we 
would need measures of left-sided movement and right-sided movement from the 
neutral face. In the next section we explain how we developed these measures. 

4.3.1 Developing the measures 

Building a measure of left-sided and right-sided movement on top of our measure 
of asymmetry was relatively straightforward. Shape data for the fitted neutral 
frame are stored. These shape data are separated into two sets: data for the left 
side of the face and data for the right side of the face. Given a frame, to measure 
left-sided movement from neutrality, we apply Procrustes analysis to the shape 
data for the left side of the face to determine the translation and rotation that 
minimises the Procrustes distance to the shape data for the left side of the neutral 
face. The minimised Procrustes distance is our measure of left-sided movement 
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from the neutral frame. To measure right-sided movement, we do the same for the 
right side of the face. Overall movement from the neutral face is simply defined 
as the average of left-sided and right-sided movement. Figure 4.4 illustrates the 



calculation of left-sided movement for a frame of the video where the subject is 
smiling. The top row shows the neutral frame on the left and the frame with the 
subject smiling on the right. We want to calculate the degree of shape difference 
between these two frames. On the second row we see the left-sided shape data for 
each frame. And on the third row we see these points after Procrustes analysis 
has been used to minimise their Procrustes distance using only translation and 
rotation. The Procrustes distance - which is our measure of degree of movement 
from the neutral frame - is 74.94. 




Figure 4.4: The figure illustrates how we calculate left-sided movement between 
the neutral frame (on the left of the top row) and another frame. The result 
of Procrustes analysis to the two sets of points is shown on the third row. The 
Procrustes distance - which is our measure of degree of movement from the neutral 
frame - is 74.94. 
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4.3.2 Charting left and right-sided movement 



With our measures of left-sided and right-sided movement we plotted the graph in 
Figure 4.5 for the same video that was used for Figure 4.2 The first frame 



i.e., 



frame 0) was designated as the neutral face because it contained an expressionless 
face. The y-axis records movement from this neutral frame (which means that the 
graph starts at (x, y) = (0, 0)). Magenta is used to plot movement of the left 
side of the subject's face and red is used to plot movement of the right side of 
the subject's face (where left and right are from the subject's perspective). We 
have annotated five peaks on the graph with the frames that correspond to those 
peaks. Reassuringly, peaks coincide with frames where the face expresses a happy 
expression. Moreover, the only point where movement exceeds 100 coincides with 
the point where the happy expression is strongest and the subject is laughing. This 
makes sense because the subject's laughter involves not only movement of his lips 
and nose, but also his eyebrows are raised. We can also see that the magenta line 
appears to be higher than the red line for nearly all frames. The data confirm this: 
movement of the left side of the subject's face exceeded movement of the right side 
in 97.3% of the video's frames. 
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Figure 4.5: The chart shows facial movement from the neutral face on the y-axis 
against frame number on the x-axis for a video of the subject watching a comedy 
program. The neutral face was defined as the shape data for the face in frame 
(where the subject's face was expressionless). This means that the graph starts 
at the point (0, 0). The magenta line records movement of the left side of the 
subject's face and the red line records movement of the right side of the subject's 
face (where left and right are defined from the subject's perspective). 



Figure 4.6 is a scatter graph plotting left-sided movement against right-sided 



movement for the video (each point of the graph represents a frame) . The straight 
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line y = x is drawn in red to show that most points lie above the line which means 
that the majority of frames exhibit more movement on the subject's left side than 
on his right side. We also see that movement on the left side is closely correlated 
to movement on the right side. The Pearson correlation coefficient was measured 
to be 0.996 which shows a very strong dependency (a coefficient of 1 means a 
perfect correlation). What we can conclude, then, about the relation between left 
and right-sided movement (for this particular subject, expressing the particular 
emotion "happiness") is that left and right movement happen simultaneously but 
that the degree of left movement nearly always exceeds the degree of right move- 
ment. This latter fact counts as evidence against the valence hypothesis and in 
favour of the right hemisphere hypothesis (because the valence hypothesis predicts 
greater right-sided movement for positive emotions whereas the right hemisphere 
hypothesis predicts greater left-sided movement for positive emotions). 




20 40 60 80 100 120 

Movement of right side of subject's face (from neutral face) 



Figure 4.6: The figure shows a scatter plot of left-sided movement on the y-axis 
and right-sided movement on the x-axis. The line y = x is drawn in red. 
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4.3.3 Charting asymmetry against overall movement 

One of the reasons for putting measures of left and right-sided movement on the 
video was that they can be used to get a measure of overall movement (by aver- 
aging left and right-sided movement) and we can then see if overall movement is 
correlated to asymmetry. Figure |4,7| is a scatter graph showing overall movement 
from the neutral face on the x-axis and asymmetry on the y-axis. Remember that 
we are using overall movement as our measure of the strength of happiness ex- 
pression. The graph shows a clearly positive correlation (the Pearson coefficient is 
0.77), with an increase in the strength of happiness typically being accompanied 
by an increase in asymmetry. We can see that the relationship is not linear; it 
seems that earlier increases in the strength of expression (for example, increases 
from 20 to 40) make less of a difference to asymmetry than later increases (for 
example, increases from 80 to 100). We have annotated the plot with some frames 
from the video. Values of x between 20 and 60 are associated with slight smiles 
or smirks; asymmetry for these faces is not much greater than asymmetry of the 
neutral face (which has asymmetry of 15.81). Values of x between 70 and 90 are 
associated with broader smiles which increase the level asymmetry to around 20 to 
25. The top-right small cluster of points marks the period of the video when the 
subject was laughing, which was associated with the greatest asymmetry. There 
are a few points that lie outside the main body of points. One of these outliers 
is annotated at the top-left of the graph. This is a frame where the asymmetry 
over strength of expression ratio is higher than for most frames. The reason for 
this is that the subject's tongue is distorting the shape of the mouth, increasing 
asymmetry, but the face is still fairly close to the neutral face. 



4.4 Analysing a second subject 

To compare the results obtained thus far, a 10-minute video of another subject 
watching a comedy program was recorded. The second subject was male, one 
year younger than the first subject and - like the first subject - right-handed. As 
with the first video, we selected a segment of the video that captured a range of 
emotional expression. This segment contained 5,817 frames and was six and a half 
minutes in length. We trained an AAM for the video until we achieved fitting 
performance equivalent to the performance for the first video. We then ran the 
asymmetry measure on the segment to return 5,817 data points. For each data 
point we had a measure of asymmetry, a measure of left-sided movement from the 
neutral frame (where the neutral frame was the first frame, where the subject's 
face was expressionless) and a measure of right-sided movement from the neutral 



frame. We were then able to plot scatter graphs analogous to Figures 4.6 and 4.7 
for this second subject. 
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Strength of "Happy Expression" 



Figure 4.7: The figure shows a scatter graph displaying strength of happiness ex- 
pression on the x-axis and asymmetry on the y-axis. The graph shows a clearly pos- 
itive correlation (the Pearson coefficient is 0.77), with an increase in the strength 
of happiness typically being accompanied by an increase in asymmetry. 
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Figure 4.8 is a scatter graph showing left-sided movement from the neutral 



face on the y-axis, and right-sided movement from the neutral face on the x- 
axis. Comparing this graph to the graph in Figure 4.6 there are two important 
differences. The first is that movement for both sides of the face ranges from values 
of to just over 60, whereas for the first video, movement ranged from values of 
to over a 100. This suggested that the expressions of happiness elicited from the 
second subject were not as strong as the first subject. Watching the video showed 
that this was indeed the case. Either through inhibition, or because the comedy 
program was not funny enough, whilst the subject smiles and grins at points, 
the subject does not laugh at any point in the video. The second difference to 
Figure 4.6 is that the right side of the face shows dominance over the left, at least 
for strong expressions (in contrast, for the first subject, a clear left-sided dominance 
was found, with 97.3% of frames showing greater left-sided movement than right- 
sided movement). 85.3% of frames showed average movement of less than 30. 
Of these frames, 48.4% of frames showed greater left-sided movement, and 51.6% 
showed greater right-sided movement. However, for the frames that showed average 
movement greater than 30, only 7% showed greater left-sided movement and 93% 
showed greater right-sided movement. Our findings for the second subject provide 
support to the valence hypothesis, which predicts greater right-sided movement 
for expression of positive emotions. 

We can also plot a scatter graph of magnitude of asymmetry against strength 



of "happiness expression" for the second subject, analogous to Figure 4.7 This 



is shown in Figure 4.9 We have added 4 images from the video to annotate 
the graph. The left-most image shows the subject with a neutral face and the 
degree of asymmetry is low. The next image (from the left) shows the subject 
with some slight movement away from the neutral face (movement is just under 
30) and asymmetry elevated slightly, but not by much. The third image shows the 
subject with a slight smile. As with the second image, asymmetry is elevated, but 
only slightly (compare, for instance, the range of asymmetry values for the first 
subject in Figure |4~7| - asymmetry gets close to 40 when the subject is laughing). 
The right-most image shows one of the most expressive frames of the video (i.e., 
a frame with high movement from the neutral face). The subject is seen to be 
smiling and the asymmetry value is around 20. 

It is hard to infer from Figure 4.9 that asymmetry increases as the strength of 
expression increases for our second subject. Asymmetry is seen to increase slightly 
as the strength of expression increases, but the trend is not as clear as for our 
first subject. It is worth noting that strength of expression ranges from to 60 
for the second subject in contrast to the first subject where it exceeded 100 when 
the subject was laughing strongly. If we were able to record a second video of 
the second subject that contained strong laughter, then we could see if stronger 
expressions of happiness are associated with larger increases in asymmetry, as for 
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Movement of right side of subject's face (from neutral face) 



Figure 4.8: The figure is a scatter graph for the second subject of left-sided move- 
ment (from the neutral face) on the y-axis and right-sided movement (from the 
neutral face) on the x-axis. The line y=x is drawn in red. 
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our first subject. 
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Strength of "Happy Expression" 

Figure 4.9: The figure shows a scatter graph of facial asymmetry on the y-axis and 
strength of the "happiness expression" on the x-axis for our second subject. 

This concludes the chapter. In the next and final chapter we review our work 
and subject it to a critical evaluation. 
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Chapter 5 

Conclusion 



In this chapter we summarise our project and report; submit the former to a 
critical evaluation; and suggest possible future directions for research. 

5.1 Summary of project and report 

Our project was to: 

• develop an automatic frame-by-frame measure of facial asymmetry in videos 
of faces that improved on previous measures (automatic or otherwise), and 

• use the measure to analyse the relationship between facial asymmetry and 
emotional expression, and connect our findings with previous research of the 
relationship 

Our report was divided into three chapters (excluding the introductory chapter 
and this chapter). In Chapter [2] we provided the reader with the motivation for 
our work and its theoretical basis. We began by showing why anyone should want 
to measure facial asymmetry. Psychologists have long been interested in the asym- 
metry of faces in static, neutral poses (and have connected the degree of asymmetry 
to attractiveness, health and personality type) and neuropsychologists have inves- 
tigated the dynamics of asymmetry during emotional expression, to understand 
emotional processing in the brain. We argued that there was an opportunity to 
improve on previous measures of facial asymmetry and provide neuropsychologists 
with new tools to collect data concerning the relationship between asymmetry and 
emotional expression. Earlier measures relied on human input (to measure any 
given image) and were therefore time-consuming, limiting the quantity of data 
that could be collected for a study. More recent measures (based on techniques 
from computer vision) that could theoretically collect larger datasets were found to 
be limited in number. Active Shape Models and Active Appearance Models were 
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selected as the basis of a new, automatic measure of facial asymmetry, without 
the limitations of those previously discussed, and the theory underlying them was 
presented in detail. 

In Chapter [3] we thoroughly described the work that was done to develop two 
measures of facial asymmetry. After selecting an AAM library (and justifying its 
selection) we established the library's capability by training it on a test video and 
evaluating its fitting performance. We then used the library as the basis for the 
development of the two measures. The first measure used shape data to measure 
asymmetry; shape data for each side of the face were collected and Procrustes 
analysis was used to minimise the Procrustes distance between the shape data for 
the right side and the shape data for the left side (through reflection, rotation and 
translation). The minimised distance was taken as the degree of asymmetry. The 
second measure used both shape data and texture data. The former were used 
to calculate the axis of symmetry for the face. Texture was then reflected across 
the axis, and asymmetry was defined as the mean absolute difference between 
corresponding pixel values for the reflected and unreflected sides of the face. The 
two measures were critically compared, and the first was deemed to be preferable 
for several reasons. Two limitations of the first measure were identified: first, 
that it was only as good as the quality of the shape data provided by the fitting; 
second, that it could be affected by lateral head rotation, which therefore had to 
be eliminated by - for example - attaching the camera to the subject's head. 

Chapter [4] described applications of the measure and analysed data that were 
collected. Due to time constraints, a single emotion - happiness - was selected for 
analysis. A seven-minute segment of video of a subject viewing a comedy program 
was analysed in detail. Several interesting results were found. Analysis of a graph 
of asymmetry against frame number (Figure 4.2) highlighted frames of elevated 
asymmetry in the video; analysis of the shape data for these frames identified sur- 
prising causes of asymmetry (such as the subject raising his eyebrows). Measures 
of left and right facial movement (relative to a neutral frame) were developed and 
were used to plot a scatter graph of left-sided movement against right-sided move- 
ment, which confirmed a clear bias in favour of left-sided movement for the subject 
(Figure 4.6). In contrast, results for a second subject (Figure 4.8) indicated a bias 
in favour of right-sided movement (at least for stronger expressions). The results 
were connected with our research into topics in neuropsychology; the results for 
the first subject supported the right hemisphere hypothesis and the results for the 
second subject supported the valence hypothesis. Scatter graphs of asymmetry 
against the strength of the "happiness expression" were also plotted. A positive, 
but non-linear, relationship between the variables was observed (Figure 4.7) for 
the first subject. The results for the second subject (Figure 4.9) were inconclusive. 
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5.2 Critical evaluation of work 



To perform a critical evaluation of the project we will consider its strengths and 
weaknesses in turn. 

5.2.1 Strengths 

The first aim of the project was to develop an automatic frame-by-frame measure 
of facial asymmetry in videos of faces that improved on previous measures. Two 
measures of asymmetry were developed, and the first was preferred, for reasons 



discussed in Section 3.4.3 We believe that the development of the first measure 
satisfies the first aim of our project. Given a video, and an ASM trained for that 
video, our measure returns a value of asymmetry for every frame, without human 
involvement (i.e., is automatic). To be sure, training of the ASM involves human 
input, as training images need to be annotated by hand. However, we have found 
that a high-quality model for a ten-minute video can be trained from as few as 



25 images (Section 4.2.1), where each image takes 2 to 3 minutes to annotate. A 
ten-minute video (at 15 frames per second) yields 9,000 frames of asymmetry data. 

Our measure improves on earlier measures - that relied wholly on human mea- 
surement - because it can collect more data for analysis. (To measure the asymme- 
try of 9,000 frames, by hand, would likely take several days.) But it also improves 
on other automatic measures of asymmetry in the literature. To our knowledge, 
no one has previously used ASMs or AAMs to measure facial asymmetry; nor has 
anyone previously developed an automatic Procrustes-based measure. As we saw 



in Section 2.3, when we looked at other automatic measures, Desai (2009) relied on 



an entropy-based measure (that looked at average changes in pixel values across 



each side of the face) and Nicholls et al. (2004) used a 3-dimensional range finder 



to detect movement. Each measure had self-confessed limitations. Desai assumed 
that "changes in the surface lighting of the face reflect movement" and his mea- 
sure was therefore affected by lighting variation, and Nicholls and colleagues stated 
that they could only measure movement perpendicular to the surface of the face. 
However, the main limitation of both measures, in our view, was that they treated 
every pixel or coordinate of the face equally, and were unable to ignore asymme- 
tries that, for whatever reason, we may wish to exclude from our measure (such 
as movement of the eyes - which cannot be ignored by Desai - or uneven raising of 
the eyebrows - which cannot be ignored by Nicholls). A Procrustes-based measure, 
such as ours, is based only on the chosen set of landmarks, and so we can choose to 
ignore any feature of the face. Furthermore, if we want to increase the weighting 
of a feature in the measure, we need only add extra points to it. 

It is for these reasons that we believe that our novel measure, as an automatic 
Procrustes-based measure, improves on earlier measures. First, it improves on 
earlier measures which rely on human measurement because it can collect more 
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data. And second, it improves on measures that are not Procrustes-based, because 
of its ability to specify and weight the features of the face that count towards the 
measure. 

The second aim of the project was to use our measure to investigate the re- 
lationship between facial asymmetry and emotional expression, and connect our 
findings with previous research of the relationship. We pursued this aim in Chap- 
ter [4j A seven-minute segment of video (of just over 6,000 frames) of the author 
watching a comedy program was analysed in detail. Novel Procrustes-based mea- 
sures of left and right-movement from a neutral frame were developed (it should 
be noted that these measures share the same strengths as our measure of asym- 
metry; namely, they are automatic and allow the user to choose which features 
of the face are tracked). We discovered a clear bias in favour of left-sided facial 
movement for the subject during expressions of happiness, and we connected this 
result to work in the literature (as support for the right hemisphere hypothesis). 
For a second subject, we discovered a bias in favour of right-sided facial movement 
during stronger expressions of happiness (and counted this as support for the va- 
lence hypothesis). We used our measures of left and right-movement to define a 
measure of the strength of the "happiness expression" . To our knowledge, no one 
in the literature has previously investigated the relationship between the strength 
of an emotional expression and the degree of its asymmetry (studies have instead 
focussed on the direction of asymmetry - i.e., which side of the face moves more - 



rather than on its magnitude). Figure 4.7 illustrates the discovery of an interesting 



non-linear but positive relationship between the strength of the "happiness expres- 
sion" and its degree of asymmetry for our first subject. The data for our second 
subject do not rule out the same relationship. On the basis of our analysis, we 
were able to form a hypothesis worthy of future research: that magnitude of asym- 
metry increases with strength of emotional expression, with stronger expressions 
especially associated with stronger asymmetry. 

Although these results are limited to two subjects and a single emotion, similar 
data could be collected for further subjects and emotions. We discuss this further 
in Section [131 



5.2.2 Weaknesses and possible improvements 

We believe that there are three genuine weaknesses of our project, and one aspect 
of the project that could be perceived as a weakness, but that is not a genuine 
weakness. Two of the three genuine weaknesses apply to our measure, and the 
other applies to our analysis of the relationship between asymmetry and emotional 
expression. 

The aspect of the project that could be perceived as a weakness, but that 
is not a weakness, is that time was invested into developing a second measure 



of asymmetry (Section 3.4.2), but the measure was eventually discarded and the 
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first measure was preferred. And further, if time had not been invested into the 
second measure, there would have been more time to collect and analyse data for 
additional subjects. There are two reasons why this is not a genuine weakness 
of the project. The first reason is that development of the second measure was 
necessary to realise its weaknesses, and to appreciate the strengths of the first 
measure. The chief advantage of the first measure over the second measure is its 
flexibility; because the measure is based only on the chosen set of landmarks, the 
user can choose to ignore any feature of the face that they wish to, and focus on 
certain features of the face. We saw in Section 15.2.11 that this is also the chief 
advantage of the first measure over other recent measures in the literature. It was 
by developing the second measure that we were able to perceive this advantage. 
The second reason why developing the second measure was not a weakness of the 
project is that even if we had collected data for additional subjects, we would - 
at most - have had time to collect data for 5-10 subjects, and this sample size 
would not have been large enough to draw generalisable conclusions worthy of 
publication, and so we feel that focussing on the development of the measures was 
more valuable. 

The first genuine weakness of our measure is that it is affected by lateral rota- 
tion of the subject's head, which means that our measure can only reliably analyse 
videos where lateral rotation is absent or negligible (see Section 3.5.2). Since lat- 
eral rotation of a subject's head is natural during speech and emotional expression, 
our measure can only be performed on videos where the subject has been asked 
not to turn their head sideways, or where the camera has been attached to the 
subject's head to eliminate rotation. This means that our measure can only be 
used for specially recorded videos and cannot be applied to - for example - corrob- 
orate asymmetry measures in earlier studies. However, whilst this is a limitation 
of our measure, we do not believe that it is a serious one. First, new studies into 
asymmetry and emotional expression will most likely want to collect new data, 
and attaching the camera to the subject's head is nothing more than an inconve- 
nience. And second, the limitation provides an opportunity to undertake research 
into building on our measure so that it can cope with lateral head rotation. 

The second genuine weakness applying to the measure is that training a model 
for a subject involves manually annotating images, which typically takes 2-3 min- 
utes per image, and therefore limits the amount of data that can be collected for 
a study. It should be noted that this is not a weakness of the Procrustes-based 
approach to measuring asymmetry or of our implementation of it. Rather it is a 
consequence of the fact that the AAM library that was selected required training 
and did not provide a pre-trained model. In addition, we should say that we never 
experimented with a model trained on more than 25 training images. It may be 
that if we were conducting a large study of facial asymmetry and emotional ex- 
pression with, for instance, 100 subjects, we could train the model on a large set 
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of training images (i.e., several hundred) drawn from across the study, and the 
model would fit all subjects well and generalise to new subjects. However, the 
initial training would still involve a substantial time investment, and so a possible 
improvement of our measure is to integrate it with a method of fitting that does 
not require training. In communication with Alexander Davies (PhD student at 
the University of Bristol) , it has recently come to our attention that a face tracker 



due to Jason Saragih (based on Saragih et al. (2011)) performs well on a range of 



videos (accurately locating facial features) and comes with a pre-trained model, 
and requires no additional training. It may be that combining this face tracker and 
our Procrustes-based measure of asymmetry could yield a fully automatic measure 
of asymmetry. 

The final weakness of our project, that applies to our analysis of the relation- 
ship between asymmetry and emotional expression, is that we only collected data 
for two subjects and a single emotion (happiness). The reason for this was simply 
that developing and assessing the measures, and producing this report, took the 
majority of the time available for the project. However, even though we were un- 
able to analyse data for more than two subjects, we believe that we have prepared 
the ground for further research worthy of publication. In particular, our analy- 
sis of the correlation between the magnitude of facial asymmetry and strength of 



emotional expression is original (i.e., Figures 4.7 and 4.9) and allows us to postu- 
late a hypothesis worthy of further investigation: that magnitude of asymmetry 
increases with the strength of emotional expression where strong expressions (such 
as laughter) are especially associated with strong asymmetry. Furthermore, our 
measure could be used for further research into determining whether positive emo- 
tions are associated with greater movement of the left side of the face or of the 
right side (or of neither side). This remains an open question in the literature. 

5.3 Future directions 

We end the report by suggesting three interesting and fruitful ways to build on 
the research undertaken. 

The first way is it to attempt to develop the measure so that it can cope with 
lateral head rotation. If this could be accomplished, new videos for analysis could 
be recorded without the camera attached to the subject's head, and videos from 
previous studies on emotional expression could be analysed. This is probably the 
most difficult of the three suggested paths for future research. As we saw in Sec- 



tion 3.5.2, our attempt to handle lateral rotation using OpenCV's homography 
functions was unsuccessful. What is needed, it seems, is a way of estimating the 
depth of each point on the face (i.e., distance from the camera lens). This informa- 
tion could then be used to more accurately estimate the degree of lateral rotation 
of the head, and factor this degree into the measure of asymmetry. However, we re- 
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main sceptical at this point that a 2-dimensional measure could be developed that 
was wholly unaffected by head rotation, and further research is certainly needed. 

The second way to build on our research is to focus on increasing the degree 
of automation of our measure, by integrating it with a face tracker that does not 
require manual training. Alexander Davies has recently started using a face tracker 



due to Jason Saragih (based on Saragih et al. (2011)) that does not need training 



and - according to Alexander - can effectively fit new faces. A fruitful path for 
future research would be to integrate this face tracker with our Procrustes-based 
measure; the tracker would be used to locate facial landmarks, and our measure 
would be used to calculate the degree of asymmetry from the landmarks. The 
end result would be a measure of facial asymmetry that is fully automatic. The 
benefit over our measure would be that a new study of facial asymmetry - with, 
for instance, 100 subjects - would not require a laborious training stage involving 
several hundred training images and taking several hours. 

The final way of building on our research is to use the developed measure for 
a substantial study (of as many subjects as possible) that seeks to answer the 
following: 

• Is emotional expression, for positive emotions, associated with greater move- 
ment on the left side of the face (as the right hemisphere hypothesis predicts), 
or the right side of the face (as the valence hypothesis predicts)? We have 
found greater movement on the left side for one subject, but greater move- 
ment on the right side for another. Time permitting, we would have liked to 
have collected data for additional subjects. 

• Is strength of emotional expression positively correlated with degree of facial 
asymmetry. For happiness, and for one subject, we have found that it is. For 
a second subject, the data are inconclusive. Our measure provides the op- 
portunity to collect data for further subjects, and determine the correlation. 

This concludes Chapter 5. We hope to have convinced the reader that our research 
has made valuable contributions to the problem of measuring facial asymmetry 
in videos, and performed original analysis into the relationship between facial 
asymmetry and emotional expression, that is interesting in its own right, and that 
can teach us about emotional processing in the brain. We hope further that we, 
or other researchers, have the opportunity to build on our research by following 
one of the paths described above. 
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Appendix A 



ASM and AAM software 
implementations 



The table below displays the URLs associated with the ASM and AAM software 
implementations presented in Section 3.2 





Author 


URL 


1 


Ghassan 


http: //www. cs . sf u. ca/~hamarneh/sof tware/asm/index .html 




Hamarneh 




2 


Tim 


http: //personalpages .manchester . ac . uk/ staff /timothy . f . cootes/ 




Cootes 


software/ am_tools_doc/index . html 


3 


Stephen 


http: //www.milbo .users . sonic .net/ stasm/ 






Milborrow 






4 


Yao Wei 


http: //code . google . com/p/ asmlibrary/ 


5 


Mikkel B. 


http: //www2 . imm.dtuTdk/~aam/ 






Stegmann 






6 


George 


http: //cvsp . cs .ntua.gr/ sof tware/AAMtools/ 




Papan- 
dreou 




7 


Yao Wei 
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Appendix B 

A theorem about reflection 



The following theorem was required in Section 3.4.2 Its proof is provided here. 

Theorem. Reflection of the point (x 1 , y') in the line y = mx takes the point 
to: 

,2my' — (to 2 — l)x' (m 2 — l)y' + 2mx' 



1 + to 2 



1 + to 2 



Proof. Let L\ be the line y = mx. Let L2 be the line perpendicular to L\ that 
passes through (x',y'). Then L2 has the form y = + c. Let (x",y") be the 
intersection of L\ and L2. Since (x',y') lies on L2 and (x",y") lies on Li and L2, 
we have the following: 



+ c 



(B.l) 



■m 



+ c 



(B.2) 

y" = mx" (B.3) 

We have thre e eq uations and three unknowns (x",y" and c). C omb ining Equa- 

Substitute in Equation 



tions 

y' + - 



B.l 



and 



B.2 



yields y' + ^ = y" + ^ 



,„ mx" + ^. Thus, x 



II _ my'+x' // _ m(my'+x') 

1 ' .1 1+m 2 



B.3 



to find that 



1+m 2 



The coordinates of the reflected point are (x"',y"') where x'" = 2 * x" — x' and 



y 



2 * y" — y' . Thus, the reflected point is 



my' + x' , m(my' + x') , 
(2* — 5- -x ,2* — — ^ y) 



1 + m 2 1 + m 2 

which, after some algebra, simplifies to the desired result. 



□ 
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Appendix C 

Source code excerpt 



In this appendix we provide a source code excerpt. The code below implements 
the first measure (i.e., the measure described in Section 3.4.1). The code needs to 
be run with the AAM library and OpenCV. 



#i n c 1 u d e 
#i nclude 
#i nclude 
#i nclude 
nclude 
^include 



AAM JC . h" 
AAM.Basic . h" 
AAM.MovieAVI . 
VJfaccdctcct 
global . h" 
gnuplot_i .h" 



^include <math . h> 



9 
10 
11 
12 
13 
14 
IS 
16 
17 
18 

11) 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 



#define GREEN 1 



using namespace 



std ; 



float 

void 

void 

void 

void 

void 

void 

void 

void 

void 

void 

float 

bool 

void 



getMedian ( CvMat* pts); 
splitPoints (CvMat* pts_lcft , CvMat* pts„right , CvMat* pts, 
pointsOnImage(IplImagc* image , CvMat * pts_left , CvMat * pts. 
printPts (CvMat* A); 
PrintMat (CvMat* A); 

p oi n t s O n B 1 ack ( I p 1 1 m age * image, CvMat* pts_left , CvM; 
int framcno) ; 

doProcrustes (CvMat* T, CvMat* translation , CvMat* A, 
GetMean (CvMat* sre, CvMat* dst); 
translate_to_origin ( CvMat * pts); 
normalise ( CvMat * pts); 

t r a n s 1 a t c ( CvMat * sre , CvMat* translation 



float 

right 



t* p t s _r 
CvMat* 



ight 

B) ; 



median ) ; 
int framcno ) ; 



char* win down am c , 



ProcrustcsDistance (CvMat* one , CvMat* 
inFramcs_to_savc(int a) ; 

overlayPoints (Ipllmagc* dst , Ipllmage* 



dst ) ; 
f , int 



framcno ) ; 



image , CvMat* points , int colour) ; 



static void print-version () 

{ 

printf (" \ n\ n" 



} 



/« 

h 
I* 
h 
h 

\n\n» ) ; 



AAMLibrary 
Copyright (c) 



:************* 

A C++ open 

2008-2009 by Yao Wei, all 
Contact : nj us t y w @ gmai 1 



************************** 
ource for face alignment 



i g h t s 
com 



►An 
,/\n 
,/\n 
►An 
►An 



static void usage () 

{ 

printf (" Usa 
exit ( ) ; 

} 



fit model-file 



ide_filc (image/video) _file \ n" ) ; 
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45 int main(int argc , char** argv ) 

46 { 

47 print.vcrsion () ; 
48 

49 if ( argc !— 4) usage () ; 

50 

51 AAM_Pyramid model ; 

52 model . RcadModcl ( argv [ 1 ] ) ; 

53 VJfaccdetcct faccdet; 

54 faccdet. LoadCascadc ( argv [ 2 ] ) ; 

55 char f i 1 e n a m e [ 1 ] ; 

56 strcpy ( filename , argv[3]J ; 
57 

58 if ( strstr (filename , " . avi " ) ) 

59 { 

60 AAM_MovieAVI aviln ; 

61 AAM.Shapc Shape; 

62 Ipllmagc* imagc2 — 0; 

63 bool flag — false ; 
64 

65 FILE* pFilc ; 

66 pFilc — fo pen ("results_asymm.txt", " w" J ; 

67 fputs (" frame number , asymmetry \n" , pFilc ) ; 
68 

69 FILE* fp ; 

70 fp — fopen (" point_distances.txt", " w" ) ; 
71 

72 aviln. Open (filename) ; 

73 

74 for (int j — 0; j < aviln. FrameCount ( ) : j ++) 

75 { 

76 printf (" Tracking frame %04 i : ", j); 
77 

78 Ipllmagc* image — a v i I n . RcadFramc ( j ) ; 

79 

80 if(j = || flag = false) 

81 { 

82 flag — model . InitShapcFromDctBox ( Shape , faccdet, image); 

83 if ( flag = false ) 

84 continue ; 

85 } 
86 

87 model. Fit (image, Shape, 30, false); 

88 if(imagc2 — — 0) imagc2 — c v C r eat el mage ( c v G et S i ze ( image ) , image— >depth , image— > 

nChannels) ; 

89 cvZcro ( imagc2 ) ; 

90 model . Draw ( image2 , Shape, 2); 
91 

92 char filename [ 1 0] ; 

93 

94 if (inFramcs_to_savc(j)) { 

95 sprintf (filename , " p oi n t s I m agc%d . j pg " , j ) ; 

96 Ipllmagc* imageallpts — c vC r e at el m age ( c v G e t S i z e ( image ) , image— >depth , image— > 

nChannels) ; 

97 ovcrlayPoints ( imageallpts , image, po ints.al 1 , GREEN) ; 

98 c vS avelmage ( f i le n am c , imageallpts) ; 

99 cvRclcasdmagc (& i m age allp t s ) ; 
100 } 

101 

102 float median — get Median ( p o i n t s _ a 1 1 ) ; 

103 

104 CvMat* ptB-left = cvCreateMat (( points-all ->cols ) / 4, 2, CV-64FC1); 

105 CvMat* pts_right = c vC r eat eM at ( ( p o i n t s _ a 1 1 -> c o 1 s ) / 4, 2, CV-64FC1); 

106 CvMat* p t s _r i gh t _r o t at c d = c vC r eat eMat ( ( p o i n t s _ a 1 1 -> c o 1 s ) / 4, 2, CV-64FC1); 

107 CvMat* pts_right_translatcd = c vC r cat cM at ( ( p o i n t s _ a 1 1 -> c o 1 s ) / 4, 2, CV_64FC1); 
108 

109 splitPoints(pts_lcft , pts_right , point s _all , median); 

110 

111 pointsOnlmage ( image . pts_lcft , pts_right , j); 

112 

113 pointsOnBlack ( image , pts_lcft , pts_right , "pointsblack", j); 

114 

115 CvMat* T= cvCrcatcMat (2 , 2, CV_64FC1 ) ; 

116 CvMat* translation = c v C rcat cM at ( 1 , 2, CV.64FC1); 
117 

118 doProcrustcs (T, translation , pts_lcft , pts.riglit ) ; 

119 

120 cvGEMM (pts.riglit , T, 1, NULL , 0, p t s _r i g h t _r o t a t c d , 0); 

121 pointsOnBlack ( image , pts_lcft, pts_right_rotatcd, "rotatedpoints", j); 
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122 

123 translate(pts_right_rotated , translation , pts_right_translatcd) ; 

124 pointsOnBlack(imagc, pts_lcft , pts.right.translatcd , "translatcdpoints", jj; 
125 

126 fprintf(pFile, "%d , %f \n" , j, ProcrustcsDistancc ( pt s.lcf t . pts_right_translated, 

f P , j)); 

127 

128 sprintf ( filename , "videocaps / f r amc%d . j p g " , j J ; 

129 cvSavelmage ( f i 1 e name , image) ; 

130 CvMat* neutral_points_left = cvCreateMat (1 ,NUMLANDMARKS, CV-64FC1); 

131 cvWaitKcy(l) ; 

132 } 
133 

134 fclosc ( pFile ) ; 

135 fclose(fp); 

136 cvReleascImagc (&imagc2 ) ; 

137 } 
138 

139 else 

140 { 

141 Ipllmage* image — cvLoadlmagef filename , —1); 

142 if (image = 0) 

143 { 

144 AAM_FormatMSG ( " ANNOT open image file %s\n", filename); 

145 AAM_ERROR( crrmsg ) ; 

146 } 
147 

148 AAM_Shape Shape; 

149 bool flag — flag — model . Init ShapcFromDct B ox ( Shape , faccdet , image); 

150 if ( flag = false ) 

151 { 

152 AAM_FormatMSG ( " The image doesn't contain any faccs\n"); 

153 AAM_ERROR( crrmsg ) ; 

154 } 
155 

156 model . Fit ( image , Shape, 50, true); 

157 model . Draw ( image , Shape , 2) ; 
158 

159 cvNamcdWindow ("Fitting"); 

160 cvShowImagc ( " Fitting" , image) ; 

161 cvWaitKcy (0) ; 
162 

163 c vRele aselm age (& image ) ; 

164 } 
165 

166 return 0; 

167 } 
168 

169 /* gets the median x coordinate for the set of points */ 

170 float gctMcdian (CvMat* pts) 

171 { 

172 float xvalucs [( pts->cols ) / 2]; 
173 

174 for (int i=0; i<(pts->cols / 2); i ++) { 

175 xvalucs[i] = cvGct Real 1 D ( p t s , (2*i)); 

176 } 
177 

178 int changes ; 

179 float tmp; 
180 

181 do { 

182 changes — 0; 

183 for (int j=0; j<(pts->cols / 2) - 1; j ++) { 

184 if (xvalues[j] > x v al u c s [ j + 1 ] ) { 

185 tmp — xvalucs [ j ] ; 

186 xvalucs [ j ] — x v a 1 u c s [ j + 1 ] ; 

187 xvalucs [j+1] = tmp; 

188 changcsH — h; 

189 } 

190 } 

191 } 

192 while (changes) ; 
193 

194 float median = ( x v a 1 u e s [ ( p t s — > c o 1 s / 2)/2] + x v a 1 u c s [ ( ( p t s — > c o 1 s / 2)/2) — l])/2; 

195 

196 return median; 

197 } 
198 

199 /* splits the set of points into two subsets , using the median x coordinate */ 
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200 


void s p 1 i t P o i n t s ( CvMat * pts_lcft , CvMat* pts_right , CvMat* pts, float 


median ) 


201 


{ 






202 




int lcft.cnt— 0, right_cnt— 0; 




203 








204 




for (int i=0; i<(pts->cols / 2); i ++) { 




205 




if ( cvGetReallD ( pts , (2*i)) < median && lcft.cnt < p t s _ 1 c f t — >ro ws ) 


{ 


206 




cvSetReal2D ( pts-left . left-cnt , 0, cvGetReallD ( pts , (2*i))); 




207 




cvSetReal2D ( pts. left . left-cnt, 1, CvGetReallD ( pts , (2*i) + 1)); 




208 




left _cnt ++; 




209 




} 




210 




else { 




211 




if (right-Cnt < pts_right — > rows) { 




212 




cvSctRcal2D ( pts_right , right.cnt , 0, cvGetReallD ( pts . (2*i))); 




213 




cvSctRcal2D ( pts_right , right.cnt, 1, cvGetReallD ( pts . (2*i) + 


i)) ; 


214 




right_cnt+ + ; 




215 




} 




216 




} 




217 




} 




218 


} 




219 








220 


/* 


puts pts-left and pts _right on the image and shows the result */ 




221 


void pointsOnlmage ( Ipllmage* image , CvMat * pts_lcft , CvMat * pts_right , 


int frameno 


222 


{ 






223 




int x . y , x_ , y_ ; 




224 




Ipllmage* imagc2 — c vC r e at cl m age ( c v G e t S i z e ( image J , image — >dcpt h , image— >nC h annel: 


225 




cvCopy ( image , imagc2 ) ; 




226 








227 




for (int i=0; i < p t s _1 e f t ->rows ; i++) { 




228 




x = floor ( cvGetReal2D ( pts.left , i, 0)+0.5); 




229 




y = floor ( cvGetRcal2D ( pts.left , i. l)+0.5); 




230 




x_ = floor (cvGetReal2D ( pts_right , i, 0)+0.5); 




231 




y_ = floor (cvGetReal2D (pts_right, i, 1) +0.5) ; 




232 




for (int j =0;j <5;j++) { 




233 




unsigned char* rowl — &CVJMAGE_ELEM ( imagc2 , unsigned char, y — 


2 + j , 0) ; 


234 




unsigned char* row2 — &CVJMAGE_ELEM ( imagc2 , unsigned char, y_ - 


2 + j , 0) ; 


235 




for (int k = 0; k<5; k++) { 




236 




if ((x-2+k) < imagc2->width && (x-2+k) >= 0) { 




237 




rowl [(3* (x-2+k) ) +1] = 255; 




238 




} 




239 




if ((x_-2+k) < image2->width && (x_-2+k) >= 0) { 




240 




row2 [ ( 3* ( x_-2+k) ) ] = 255; 




241 




row2 [ ( 3* ( x_-2+k) ) +1] = 255; 




242 




row2 [ ( 3* ( x_-2+k) ) +2] = 100; 




243 




} 




244 




} 




245 




} 




246 




} 




247 








248 




cvShowImagc ( " ptsOnlmagc" , image2 ) ; 




249 




cvWaitKcy (1) ; 




250 








251 




char filename [100]; 




252 








253 




if ( inFrames.to.save (frameno) ) { 




254 




sprintf (filename , " pt s on imagc%d . j p g " , frameno); 




255 




cvSavelmage ( filename , imagc2 ) ; 




256 




} 




257 








258 




cvReleaselmage (&imagc2 ) ; 




259 


} 






260 








261 


/* 


•prints out a matrix of points */ 




262 


void printPts (CvMat* A) 




263 


{ 






264 




int rows — A— >rows ; 




265 




for (int i=0; i<A->rows; ++i ) { 




266 




printf ("Row %d: %f %f\n". i, cvGctRcal2D (A, i. 0), cvGctRcal2D (A, 


i , i)) ; 


267 




} 




268 




p r i n t f ( " \n" ) ; 




269 


} 






270 








271 


/» 


prints an OpenCV matrix */ 




272 


void PrintMat (CvMat *A) 




273 


{ 






274 




int i , j ; 




275 




for (i = 0; i < A->rows ; i++) 




276 




{ 




277 




pr i n t f ( " \n" ) ; 




278 




switch (CV^lAT_DEPTH(A->typc ) ) 
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279 { 

280 case CV.32F : 

281 case CV.64F : 

282 for (j = 0: j < A-> c o 1 s ; j ++) 

283 printf ("%8.3f ", ( float ) cvGctR.cal2D (A, i, j)); 

284 break ; 

285 case CV_8U : 

286 case CV.16U : 

287 for(j = 0; j < A->cols; j ++) 

288 printf ( " %6d" , ( i n t ) c vGet Real2D ( A , i, j)); 

289 break ; 

290 default : 

291 break ; 

292 } 

293 } 

294 printf (" \n" ) ; 

295 } 
296 

297 /* puts pts-left and pts_right on a black background and shows the result */ 

298 void poi n t s O n B 1 ack ( I p 1 1 m age * image, CvMat* pts_left , CvMat* pts_right , char* windownamc , 

int frameno) 

299 { 

300 int x . y , x_ , y_ ; 

301 Ipllmage* imagc2 — c vC r ea t cl m age ( c v S i z c ( ( image — >wi d t h ) * 2 , image — > h c i g h t J , image— >dcpth 

, image — >n C hannc Is ) ; 

302 c v Zcr o ( im agc2 ) ; 
303 

304 cvLinc ( imagc2 , c vP oint ( 64 , 1 ) , c v P o i n t ( 6 4 , 4 7 9 ) , CV_RGB(255, 255, 255), 1, 8, 0); 
305 

306 for (int i=0; i < p t s _1 e f t — >rows ; i ++) { 

307 x = floor (cvGetReal2D( pts-left , i, 0)+0. 5 + 640); 

308 y= floor (cvGetReal2D ( pts-left , i, 1)4-0.5); 

309 x_ = floor (cvGetReal2D ( pts.right , i, 0) 4-0. 5 + 640); 

310 y- = floor (cvGctReal2D ( pts.right , i, 1)4-0.5); 

311 for (int j=0;j<5;j++) { 

312 unsigned char* rowl — &CVJMAGE_ELEM ( image2 , unsigned char, y — 2 + j , 0); 

313 unsigned char* row2 = &CVJMAGE_ELEM ( imagc2 , unsigned char, y_ - 2 + j , 0); 

314 for (int k = 0; k<5; k++) { 

315 if ((x-2+k) < image2->width && (x-2+k) >= 0) { 

316 rowl[(3*(x-2+k))+l] = 255; 

317 } 

318 if ((x_-2+k) < image2->width && (x_-2+k) >= 0) { 

319 row2 [ ( 3* ( x_-2+k) ) ] = 255; 

320 row2 [(3* ( x_-2+k) ) +1] = 255; 

321 row2 [(3* ( x_-2+k) ) +2] = 100; 

322 } 

323 } 

324 } 

325 } 
326 

327 cvShowImagc ( windownamc , imagc2 ) ; 

328 cvWaitKey(l) ; 
329 

330 char f i 1 e n a m e [ 1 ] ; 
331 

332 if ( i n F r am e s _t o _s a v e ( frameno ) ) { 

333 sprintf (filename , "%s%d . j pg " , windownamc , frameno ) ; 

334 cvSavelmage ( filename , imagc2) ; 

335 } 
336 

337 c v Re le as c I m age (Sz i mage2 ) ; 

338 } 
339 

340 /* performs Procrustes analysis on A and B to find the transformation matrix, T, and the 

translation vector */ 

341 void doProcrustcs (CvMat* T, CvMat* translation , CvMat* A, CvMat* B) 

342 { 

343 CvMat* ptssrc = c vC r eat eMat ( A— >r ows , 2, CV_64FC1); 

344 CvMat* ptsdst = cvCrcatcMat (B— >rows , 2, CV_64FC1); 
345 

346 cvCopy (A, ptssrc); 

347 cvCopy(B, ptsdst); 
348 

349 CvMat* sremean = cv Croat cMat ( 1 , 2, CV_64FC1); 

350 CvMat* dstmean = cvCrcatcMat ( 1 , 2, CV_64FC1); 

351 GctMcan (ptssrc , sremcan); 

352 GctMcan ( p t s d s t , dstmean); 
353 

354 translate_to_origin ( ptssrc ) ; 
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355 t r a n s 1 a t c _ t o _o r i g i n ( p t s ds t ) ; 

356 

357 nor mal isc ( p t s s r c ) ; 

358 normalise ( ptsdst ) ; 
359 

360 CvMat* sretrans = cvCreateMat ( 2 , p t s s r c — >rows , CV_64FC1); 

361 cvTransposc(ptssrc, sretrans); 
362 

363 CvMat* product = cv C roat oMa t ( 2 , 2. CV-64FC1); 

364 cvGEMM( sretrans , ptsdst, 1, NULL, 0, product, 0); 
365 

366 CvMat* L = cvCreateMat ( 2 , 2, CV_64FC1); 

367 CvMat* D= cvCreateMat ( 2 , 2, CV-64FC1); 

368 CvMat* M = cvCreateMat ( 2 , 2, CV_64FC1); 
369 

370 cvSVD(product , D, L, M) ; 

371 

372 CvMat* Ltrans = c vC r c at c M at ( 2 , 2, CV_64FC1 ) ; 

373 c vT r an s p ose ( L , Ltrans) ; 

374 cvGEMM(M, Ltrans, 1, NULL, 0, T, 0); 
375 

376 CvMat* TproductmeanY = c vC r cat cM at ( 1 , 2, CV_64FC1); 

377 cvGEMM(dstmean , T, 1, NULL, 0, TproductmeanY, 0); 
378 

379 cvSub(srcmean, TproductmeanY, translation); 

380 } 
381 

382 /* finds the centroid of a set of points */ 

383 void GetMean( CvMat* sre , CvMat* dst) 

384 { 

385 float sumx — 0, sumy — 0, avx — 0, avy — 0; 

386 

387 for (int i = 0; i<src->rows; i ++) { 

388 sumx — sumx + cvGet Real2D ( s r c , i , 0) ; 

389 sumy — sumy + cvGc t Rcal2D ( s r c , i , 1) ; 

390 } 
391 

392 avx — sumx / (sre— >rows); 

393 avy — sumy / (sre— >rows); 
394 

395 cvSetRcallD ( dst , 0, avx); 

396 cvSetRcallD ( dst , 1, avy); 

397 } 
398 

399 /* translates a set of points to the origin */ 

400 void translate_to_origin (CvMat* pts) 

401 { 

402 float sumx — 0, sumy — 0, avx — 0, avy — 0; 

403 

404 for (int i = 0; i<pts— >rows ; i ++) { 

405 sumx — sumx + cvGc t Rcal2D ( p t s , i , 0) ; 

406 sumy — sumy + cvGet Real2D ( p t s , i , 1) ; 

407 } 
408 

409 avx — sumx / (pts— >rows); 

410 avy — sumy / (pts— >rows); 
411 

412 for (int i = 0; i<pts— >rows; i ++) { 

413 cvSctRcal2D ( pts , i, 0, cvGetReal2D ( pts , i, 0) - avx); 

414 cvSctRcal2D ( pts , i, 1, cvGetReal2D ( pts , i, 1) - avy); 

415 } 

416 } 
417 

418 /* normalises a set of points */ 

419 void normalise (CvMat* pts) 

420 { 

421 float sumsq — 0; 
422 

423 for (int i = 0; i<pts— >rows; i ++) { 

424 sumsq = sumsq + ( cvGc t Real2D ( p t s , i, 0) * c vGet Real2 D ( p t s , i, 0)); 

425 sumsq — sumsq + ( cvGet Real2D ( p t s , i , 1) * c vGet Real2 D ( p t s , i , 1)) ; 

426 } 
427 

428 sumsq — sqrt (sumsq) ; 

429 

430 for (int i = 0; i<pts— >rows ; i ++) { 

431 cvSctRcal2D ( pts , i, 0, cvGetReal2D ( pts , i, 0) / sumsq); 

432 cvSetReal2D ( pts , i, 1, c vGet Real2D ( pt s , i, 1) / sumsq); 

433 } 
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434 } 
435 

436 /* translates a set of points by the translation vector */ 

437 void translate (CvMat* sre , CvMat* translation , CvMat* dst) 

438 { 

439 for (int i=0; i <src ->rows ; i++) { 

440 cvSctRoal2D ( dst , i, 0, cvGetReal2D ( sre , i, 0) + cvGctRcallD ( translation , 0)); 

441 cvSotRoal2D ( dst , i, 1, cvGetReal2D ( sre , i, 1) + cvGctRcallD ( translation , 1)); 

442 } 

443 } 
444 

445 /* calculates the Procrustes distance between two sets of points */ 

446 float ProcrustesDistance (CvMat* one, CvMat* two, FILE* f, int frameno ) 

447 { 

448 float sum = 0; 

449 for (int i=0; i<one->rows; i++) { 

450 sum = sum + ( pow ( cvGctRcal2D ( one , i, 0) - cvGetReal2D ( two , i, 0),2)); 

451 sum = sum + ( pow ( cvGctReal2D ( one , i, 1) — cvGetReal2D ( two , i, 1),2)); 

452 fprintf(f, "%d %C %f\n", frameno, abs ( cv Get Real 2D ( one , i, 0) - cvGctRcal2D ( two , i, 

0)), abs ( cvGetReal2D ( one , i, 1) - cvGctRcal2D (two , i, 1))); 

453 } 

454 sum — sqrt (sum) ; 

455 return sum; 

456 } 
457 

458 /* self—explanatory */ 

459 bool inFramcs_to_savc ( int a) 

460 { 

461 for (int i=0; i <FRAMESTOPRTNT ; i++) { 

462 if ( a— — f r amcs _t o _s a v c [ i ] ) { 

463 return 1; 

464 } 

465 } 
466 

467 return 0; 

468 } 
469 

470 /* overlays a set of points on an image */ 

471 void ovcrlayPoints (Ipllmage* dst, Ipllmagc* image , CvMat* points , int colour) 

472 { 

473 int x, y; 

474 cvCopy ( image , dst); 
475 

476 for (int i=0; i <p oi n t s -> c o 1 s ; i = i+2) { 

477 y = floor ( cvGctRcallD ( points , i +1) +0.5) ; 

478 x = floor ( cvGctRcallD ( points , i)+0.5); 

479 for (int j=0;j<5;j++) { 

480 unsigned char* rowl = &CVJMAGEJELEM ( dst , unsigned char, y— 2+j , 0); 

481 for (int k = 0; k<5; k++) { 

482 rowl [(3* (x-2+k) j + colour ] = 255; 

483 } 

484 } 

485 } 

486 } 
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